Part 1

CONTEXT: Company X owns a movie application and repository which caters movie streaming to millions of users who on subscription basis. Company wants to automate the process of cast and crew information in each scene from a movie such that when a user pauses on the movie and clicks on cast information button, the app will show details of the actor in the scene. Company has an in-house computer vision and multimedia experts who need to detect faces from screen shots from the movie scene.

• DATA DESCRIPTION: The dataset comprises of images and its mask where there is a human face.

• PROJECT OBJECTIVE: Face detection from training images

Loading the Dataset

In [1]:
# Mounting Google Drive
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [2]:
# Setting the current working directory
import os; os.chdir('drive/My Drive/CV')

Import Packages

In [3]:
import pandas as pd, numpy as np, matplotlib.pyplot as plt
from matplotlib import pyplot
%matplotlib inline
from tensorflow.keras.applications.mobilenet import preprocess_input
import cv2
import sys
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.layers import Concatenate, UpSampling2D, Conv2D, Reshape, Activation, BatchNormalization, SpatialDropout2D
from tensorflow.keras.applications.mobilenet import MobileNet
from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
import tensorflow as tf
from tensorflow.keras.losses import binary_crossentropy
from tensorflow.keras.backend import log, epsilon

Checking train directories to load dataset files :

In [4]:
!ls
'Part 1- Train data - images.npy'
In [20]:
#Setting to default values and recursion level to avoid runtime errors

np.load.__defaults__=(None, True, True, 'ASCII')
sys.setrecursionlimit(15000)
In [6]:
np_load_old = np.load

# modify the default parameters of np.load
#np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)

#np.load(path, allow_pickle=True)

np.load = lambda *a,**k: np_load_old(*a,allow_pickle=True)


data = np.load('Part 1- Train data - images.npy', allow_pickle=True)

Lets check one sample from loaded dataset :

In [12]:
fig = plt.figure(figsize = (15, 9))
ax = fig.add_subplot(1, 1, 1)
plt.axis('off')
plt.imshow(data[100][0])
plt.show()

Now we will resize image : width = height = 224, alpha = 1

In [13]:
ALPHA = 1
IMAGE_SIZE = 224
IMAGE_HEIGHT = 224
IMAGE_WIDTH = 224

Now we will create features and labels :

In our case : feature is image, label will be mask, Images will be stored in 'X' array and masks will be stored in 'masks' array

In [14]:
masks = np.zeros((int(data.shape[0]), IMAGE_HEIGHT, IMAGE_WIDTH))
X = np.zeros((int(data.shape[0]), IMAGE_HEIGHT, IMAGE_WIDTH, 3))
for index in range(data.shape[0]):
    img = data[index][0]
    img = cv2.resize(img, dsize = (IMAGE_HEIGHT, IMAGE_WIDTH), interpolation = cv2.INTER_CUBIC)
    try:
      img = img[:, :, :3]
    except:
      continue
    X[index] = preprocess_input(np.array(img, dtype = np.float32))
    for i in data[index][1]:
        x1 = int(i['points'][0]['x'] * IMAGE_WIDTH)
        x2 = int(i['points'][1]['x'] * IMAGE_WIDTH)
        y1 = int(i['points'][0]['y'] * IMAGE_HEIGHT)
        y2 = int(i['points'][1]['y'] * IMAGE_HEIGHT)
        masks[index][y1:y2, x1:x2] = 1
In [15]:
X.shape
Out[15]:
(409, 224, 224, 3)
In [16]:
masks.shape
Out[16]:
(409, 224, 224)
In [17]:
n = 100
fig = plt.figure(figsize = (15, 9))
ax = fig.add_subplot(1, 1, 1)
plt.axis('off')
_ = plt.imshow(X[n])
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
In [19]:
n = 100
fig = plt.figure(figsize = (15, 7.2))
ax = fig.add_subplot(1, 1, 1)
plt.axis('off')
_ = plt.imshow(masks[n])

Let's create the model

We will use MobileNet as model with parameter values :

input_shape: IMAGE_HEIGHT, IMAGE_WIDTH, 3 include_top: False alpha: 1.0 weights: 'imagenet'

We will also use UNET and will be adding UNET architecture layers.

In [22]:
def convulution_block(prevlayer, filters, prefix, strides=(1, 1)):
    conv = Conv2D(filters, (3, 3), padding = 'same', kernel_initializer = 'he_normal', strides = strides, name = prefix + '_conv')(prevlayer)
    conv = BatchNormalization(name = prefix + 'BatchNormalization')(conv)
    conv = Activation('relu', name = prefix + 'ActivationLayer')(conv)
    return conv

def create_model(trainable = True):
    model = MobileNet(input_shape = (IMAGE_HEIGHT, IMAGE_WIDTH, 3), include_top = False, alpha = ALPHA, weights = 'imagenet')
    for layer in model.layers:
        layer.trainable = trainable
    
    block1 = model.get_layer('conv_pw_13_relu').output
    block2 = model.get_layer('conv_pw_11_relu').output
    block3 = model.get_layer('conv_pw_5_relu').output
    block4 = model.get_layer('conv_pw_3_relu').output
    block5 = model.get_layer('conv_pw_1_relu').output
    
    up1 = Concatenate()([UpSampling2D()(block1), block2])
    conv6 = convulution_block(up1, 256, 'Conv_6_1')
    conv6 = convulution_block(conv6, 256, 'Conv_6_2')

    up2 = Concatenate()([UpSampling2D()(conv6), block3])
    conv7 = convulution_block(up2, 256, 'Conv_7_1')
    conv7 = convulution_block(conv7, 256, 'Conv_7_2')

    up3 = Concatenate()([UpSampling2D()(conv7), block4])
    conv8 = convulution_block(up3, 192, 'Conv_8_1')
    conv8 = convulution_block(conv8, 128, 'Conv_8_2')

    up4 = Concatenate()([UpSampling2D()(conv8), block5])
    conv9 = convulution_block(up4, 96, 'Conv_9_1')
    conv9 = convulution_block(conv9, 64, 'Conv_9_2')

    up5 = Concatenate()([UpSampling2D()(conv9), model.input])
    conv10 = convulution_block(up5, 48, 'Conv_10_1')
    conv10 = convulution_block(conv10, 32, 'Conv_10_2')
    conv10 = SpatialDropout2D(0.2)(conv10)
    
    x = Conv2D(1, (1, 1), activation = 'sigmoid')(conv10)
    x = Reshape((IMAGE_SIZE, IMAGE_SIZE))(x)
    return Model(inputs = model.input, outputs = x)
In [23]:
#Calling create_model() 

model = create_model(True)
model.summary()
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_1_0_224_tf_no_top.h5
17227776/17225924 [==============================] - 0s 0us/step
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 112, 112, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
conv1_bn (BatchNormalization)   (None, 112, 112, 32) 128         conv1[0][0]                      
__________________________________________________________________________________________________
conv1_relu (ReLU)               (None, 112, 112, 32) 0           conv1_bn[0][0]                   
__________________________________________________________________________________________________
conv_dw_1 (DepthwiseConv2D)     (None, 112, 112, 32) 288         conv1_relu[0][0]                 
__________________________________________________________________________________________________
conv_dw_1_bn (BatchNormalizatio (None, 112, 112, 32) 128         conv_dw_1[0][0]                  
__________________________________________________________________________________________________
conv_dw_1_relu (ReLU)           (None, 112, 112, 32) 0           conv_dw_1_bn[0][0]               
__________________________________________________________________________________________________
conv_pw_1 (Conv2D)              (None, 112, 112, 64) 2048        conv_dw_1_relu[0][0]             
__________________________________________________________________________________________________
conv_pw_1_bn (BatchNormalizatio (None, 112, 112, 64) 256         conv_pw_1[0][0]                  
__________________________________________________________________________________________________
conv_pw_1_relu (ReLU)           (None, 112, 112, 64) 0           conv_pw_1_bn[0][0]               
__________________________________________________________________________________________________
conv_pad_2 (ZeroPadding2D)      (None, 113, 113, 64) 0           conv_pw_1_relu[0][0]             
__________________________________________________________________________________________________
conv_dw_2 (DepthwiseConv2D)     (None, 56, 56, 64)   576         conv_pad_2[0][0]                 
__________________________________________________________________________________________________
conv_dw_2_bn (BatchNormalizatio (None, 56, 56, 64)   256         conv_dw_2[0][0]                  
__________________________________________________________________________________________________
conv_dw_2_relu (ReLU)           (None, 56, 56, 64)   0           conv_dw_2_bn[0][0]               
__________________________________________________________________________________________________
conv_pw_2 (Conv2D)              (None, 56, 56, 128)  8192        conv_dw_2_relu[0][0]             
__________________________________________________________________________________________________
conv_pw_2_bn (BatchNormalizatio (None, 56, 56, 128)  512         conv_pw_2[0][0]                  
__________________________________________________________________________________________________
conv_pw_2_relu (ReLU)           (None, 56, 56, 128)  0           conv_pw_2_bn[0][0]               
__________________________________________________________________________________________________
conv_dw_3 (DepthwiseConv2D)     (None, 56, 56, 128)  1152        conv_pw_2_relu[0][0]             
__________________________________________________________________________________________________
conv_dw_3_bn (BatchNormalizatio (None, 56, 56, 128)  512         conv_dw_3[0][0]                  
__________________________________________________________________________________________________
conv_dw_3_relu (ReLU)           (None, 56, 56, 128)  0           conv_dw_3_bn[0][0]               
__________________________________________________________________________________________________
conv_pw_3 (Conv2D)              (None, 56, 56, 128)  16384       conv_dw_3_relu[0][0]             
__________________________________________________________________________________________________
conv_pw_3_bn (BatchNormalizatio (None, 56, 56, 128)  512         conv_pw_3[0][0]                  
__________________________________________________________________________________________________
conv_pw_3_relu (ReLU)           (None, 56, 56, 128)  0           conv_pw_3_bn[0][0]               
__________________________________________________________________________________________________
conv_pad_4 (ZeroPadding2D)      (None, 57, 57, 128)  0           conv_pw_3_relu[0][0]             
__________________________________________________________________________________________________
conv_dw_4 (DepthwiseConv2D)     (None, 28, 28, 128)  1152        conv_pad_4[0][0]                 
__________________________________________________________________________________________________
conv_dw_4_bn (BatchNormalizatio (None, 28, 28, 128)  512         conv_dw_4[0][0]                  
__________________________________________________________________________________________________
conv_dw_4_relu (ReLU)           (None, 28, 28, 128)  0           conv_dw_4_bn[0][0]               
__________________________________________________________________________________________________
conv_pw_4 (Conv2D)              (None, 28, 28, 256)  32768       conv_dw_4_relu[0][0]             
__________________________________________________________________________________________________
conv_pw_4_bn (BatchNormalizatio (None, 28, 28, 256)  1024        conv_pw_4[0][0]                  
__________________________________________________________________________________________________
conv_pw_4_relu (ReLU)           (None, 28, 28, 256)  0           conv_pw_4_bn[0][0]               
__________________________________________________________________________________________________
conv_dw_5 (DepthwiseConv2D)     (None, 28, 28, 256)  2304        conv_pw_4_relu[0][0]             
__________________________________________________________________________________________________
conv_dw_5_bn (BatchNormalizatio (None, 28, 28, 256)  1024        conv_dw_5[0][0]                  
__________________________________________________________________________________________________
conv_dw_5_relu (ReLU)           (None, 28, 28, 256)  0           conv_dw_5_bn[0][0]               
__________________________________________________________________________________________________
conv_pw_5 (Conv2D)              (None, 28, 28, 256)  65536       conv_dw_5_relu[0][0]             
__________________________________________________________________________________________________
conv_pw_5_bn (BatchNormalizatio (None, 28, 28, 256)  1024        conv_pw_5[0][0]                  
__________________________________________________________________________________________________
conv_pw_5_relu (ReLU)           (None, 28, 28, 256)  0           conv_pw_5_bn[0][0]               
__________________________________________________________________________________________________
conv_pad_6 (ZeroPadding2D)      (None, 29, 29, 256)  0           conv_pw_5_relu[0][0]             
__________________________________________________________________________________________________
conv_dw_6 (DepthwiseConv2D)     (None, 14, 14, 256)  2304        conv_pad_6[0][0]                 
__________________________________________________________________________________________________
conv_dw_6_bn (BatchNormalizatio (None, 14, 14, 256)  1024        conv_dw_6[0][0]                  
__________________________________________________________________________________________________
conv_dw_6_relu (ReLU)           (None, 14, 14, 256)  0           conv_dw_6_bn[0][0]               
__________________________________________________________________________________________________
conv_pw_6 (Conv2D)              (None, 14, 14, 512)  131072      conv_dw_6_relu[0][0]             
__________________________________________________________________________________________________
conv_pw_6_bn (BatchNormalizatio (None, 14, 14, 512)  2048        conv_pw_6[0][0]                  
__________________________________________________________________________________________________
conv_pw_6_relu (ReLU)           (None, 14, 14, 512)  0           conv_pw_6_bn[0][0]               
__________________________________________________________________________________________________
conv_dw_7 (DepthwiseConv2D)     (None, 14, 14, 512)  4608        conv_pw_6_relu[0][0]             
__________________________________________________________________________________________________
conv_dw_7_bn (BatchNormalizatio (None, 14, 14, 512)  2048        conv_dw_7[0][0]                  
__________________________________________________________________________________________________
conv_dw_7_relu (ReLU)           (None, 14, 14, 512)  0           conv_dw_7_bn[0][0]               
__________________________________________________________________________________________________
conv_pw_7 (Conv2D)              (None, 14, 14, 512)  262144      conv_dw_7_relu[0][0]             
__________________________________________________________________________________________________
conv_pw_7_bn (BatchNormalizatio (None, 14, 14, 512)  2048        conv_pw_7[0][0]                  
__________________________________________________________________________________________________
conv_pw_7_relu (ReLU)           (None, 14, 14, 512)  0           conv_pw_7_bn[0][0]               
__________________________________________________________________________________________________
conv_dw_8 (DepthwiseConv2D)     (None, 14, 14, 512)  4608        conv_pw_7_relu[0][0]             
__________________________________________________________________________________________________
conv_dw_8_bn (BatchNormalizatio (None, 14, 14, 512)  2048        conv_dw_8[0][0]                  
__________________________________________________________________________________________________
conv_dw_8_relu (ReLU)           (None, 14, 14, 512)  0           conv_dw_8_bn[0][0]               
__________________________________________________________________________________________________
conv_pw_8 (Conv2D)              (None, 14, 14, 512)  262144      conv_dw_8_relu[0][0]             
__________________________________________________________________________________________________
conv_pw_8_bn (BatchNormalizatio (None, 14, 14, 512)  2048        conv_pw_8[0][0]                  
__________________________________________________________________________________________________
conv_pw_8_relu (ReLU)           (None, 14, 14, 512)  0           conv_pw_8_bn[0][0]               
__________________________________________________________________________________________________
conv_dw_9 (DepthwiseConv2D)     (None, 14, 14, 512)  4608        conv_pw_8_relu[0][0]             
__________________________________________________________________________________________________
conv_dw_9_bn (BatchNormalizatio (None, 14, 14, 512)  2048        conv_dw_9[0][0]                  
__________________________________________________________________________________________________
conv_dw_9_relu (ReLU)           (None, 14, 14, 512)  0           conv_dw_9_bn[0][0]               
__________________________________________________________________________________________________
conv_pw_9 (Conv2D)              (None, 14, 14, 512)  262144      conv_dw_9_relu[0][0]             
__________________________________________________________________________________________________
conv_pw_9_bn (BatchNormalizatio (None, 14, 14, 512)  2048        conv_pw_9[0][0]                  
__________________________________________________________________________________________________
conv_pw_9_relu (ReLU)           (None, 14, 14, 512)  0           conv_pw_9_bn[0][0]               
__________________________________________________________________________________________________
conv_dw_10 (DepthwiseConv2D)    (None, 14, 14, 512)  4608        conv_pw_9_relu[0][0]             
__________________________________________________________________________________________________
conv_dw_10_bn (BatchNormalizati (None, 14, 14, 512)  2048        conv_dw_10[0][0]                 
__________________________________________________________________________________________________
conv_dw_10_relu (ReLU)          (None, 14, 14, 512)  0           conv_dw_10_bn[0][0]              
__________________________________________________________________________________________________
conv_pw_10 (Conv2D)             (None, 14, 14, 512)  262144      conv_dw_10_relu[0][0]            
__________________________________________________________________________________________________
conv_pw_10_bn (BatchNormalizati (None, 14, 14, 512)  2048        conv_pw_10[0][0]                 
__________________________________________________________________________________________________
conv_pw_10_relu (ReLU)          (None, 14, 14, 512)  0           conv_pw_10_bn[0][0]              
__________________________________________________________________________________________________
conv_dw_11 (DepthwiseConv2D)    (None, 14, 14, 512)  4608        conv_pw_10_relu[0][0]            
__________________________________________________________________________________________________
conv_dw_11_bn (BatchNormalizati (None, 14, 14, 512)  2048        conv_dw_11[0][0]                 
__________________________________________________________________________________________________
conv_dw_11_relu (ReLU)          (None, 14, 14, 512)  0           conv_dw_11_bn[0][0]              
__________________________________________________________________________________________________
conv_pw_11 (Conv2D)             (None, 14, 14, 512)  262144      conv_dw_11_relu[0][0]            
__________________________________________________________________________________________________
conv_pw_11_bn (BatchNormalizati (None, 14, 14, 512)  2048        conv_pw_11[0][0]                 
__________________________________________________________________________________________________
conv_pw_11_relu (ReLU)          (None, 14, 14, 512)  0           conv_pw_11_bn[0][0]              
__________________________________________________________________________________________________
conv_pad_12 (ZeroPadding2D)     (None, 15, 15, 512)  0           conv_pw_11_relu[0][0]            
__________________________________________________________________________________________________
conv_dw_12 (DepthwiseConv2D)    (None, 7, 7, 512)    4608        conv_pad_12[0][0]                
__________________________________________________________________________________________________
conv_dw_12_bn (BatchNormalizati (None, 7, 7, 512)    2048        conv_dw_12[0][0]                 
__________________________________________________________________________________________________
conv_dw_12_relu (ReLU)          (None, 7, 7, 512)    0           conv_dw_12_bn[0][0]              
__________________________________________________________________________________________________
conv_pw_12 (Conv2D)             (None, 7, 7, 1024)   524288      conv_dw_12_relu[0][0]            
__________________________________________________________________________________________________
conv_pw_12_bn (BatchNormalizati (None, 7, 7, 1024)   4096        conv_pw_12[0][0]                 
__________________________________________________________________________________________________
conv_pw_12_relu (ReLU)          (None, 7, 7, 1024)   0           conv_pw_12_bn[0][0]              
__________________________________________________________________________________________________
conv_dw_13 (DepthwiseConv2D)    (None, 7, 7, 1024)   9216        conv_pw_12_relu[0][0]            
__________________________________________________________________________________________________
conv_dw_13_bn (BatchNormalizati (None, 7, 7, 1024)   4096        conv_dw_13[0][0]                 
__________________________________________________________________________________________________
conv_dw_13_relu (ReLU)          (None, 7, 7, 1024)   0           conv_dw_13_bn[0][0]              
__________________________________________________________________________________________________
conv_pw_13 (Conv2D)             (None, 7, 7, 1024)   1048576     conv_dw_13_relu[0][0]            
__________________________________________________________________________________________________
conv_pw_13_bn (BatchNormalizati (None, 7, 7, 1024)   4096        conv_pw_13[0][0]                 
__________________________________________________________________________________________________
conv_pw_13_relu (ReLU)          (None, 7, 7, 1024)   0           conv_pw_13_bn[0][0]              
__________________________________________________________________________________________________
up_sampling2d (UpSampling2D)    (None, 14, 14, 1024) 0           conv_pw_13_relu[0][0]            
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 14, 14, 1536) 0           up_sampling2d[0][0]              
                                                                 conv_pw_11_relu[0][0]            
__________________________________________________________________________________________________
Conv_6_1_conv (Conv2D)          (None, 14, 14, 256)  3539200     concatenate[0][0]                
__________________________________________________________________________________________________
Conv_6_1BatchNormalization (Bat (None, 14, 14, 256)  1024        Conv_6_1_conv[0][0]              
__________________________________________________________________________________________________
Conv_6_1ActivationLayer (Activa (None, 14, 14, 256)  0           Conv_6_1BatchNormalization[0][0] 
__________________________________________________________________________________________________
Conv_6_2_conv (Conv2D)          (None, 14, 14, 256)  590080      Conv_6_1ActivationLayer[0][0]    
__________________________________________________________________________________________________
Conv_6_2BatchNormalization (Bat (None, 14, 14, 256)  1024        Conv_6_2_conv[0][0]              
__________________________________________________________________________________________________
Conv_6_2ActivationLayer (Activa (None, 14, 14, 256)  0           Conv_6_2BatchNormalization[0][0] 
__________________________________________________________________________________________________
up_sampling2d_1 (UpSampling2D)  (None, 28, 28, 256)  0           Conv_6_2ActivationLayer[0][0]    
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 28, 28, 512)  0           up_sampling2d_1[0][0]            
                                                                 conv_pw_5_relu[0][0]             
__________________________________________________________________________________________________
Conv_7_1_conv (Conv2D)          (None, 28, 28, 256)  1179904     concatenate_1[0][0]              
__________________________________________________________________________________________________
Conv_7_1BatchNormalization (Bat (None, 28, 28, 256)  1024        Conv_7_1_conv[0][0]              
__________________________________________________________________________________________________
Conv_7_1ActivationLayer (Activa (None, 28, 28, 256)  0           Conv_7_1BatchNormalization[0][0] 
__________________________________________________________________________________________________
Conv_7_2_conv (Conv2D)          (None, 28, 28, 256)  590080      Conv_7_1ActivationLayer[0][0]    
__________________________________________________________________________________________________
Conv_7_2BatchNormalization (Bat (None, 28, 28, 256)  1024        Conv_7_2_conv[0][0]              
__________________________________________________________________________________________________
Conv_7_2ActivationLayer (Activa (None, 28, 28, 256)  0           Conv_7_2BatchNormalization[0][0] 
__________________________________________________________________________________________________
up_sampling2d_2 (UpSampling2D)  (None, 56, 56, 256)  0           Conv_7_2ActivationLayer[0][0]    
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 56, 56, 384)  0           up_sampling2d_2[0][0]            
                                                                 conv_pw_3_relu[0][0]             
__________________________________________________________________________________________________
Conv_8_1_conv (Conv2D)          (None, 56, 56, 192)  663744      concatenate_2[0][0]              
__________________________________________________________________________________________________
Conv_8_1BatchNormalization (Bat (None, 56, 56, 192)  768         Conv_8_1_conv[0][0]              
__________________________________________________________________________________________________
Conv_8_1ActivationLayer (Activa (None, 56, 56, 192)  0           Conv_8_1BatchNormalization[0][0] 
__________________________________________________________________________________________________
Conv_8_2_conv (Conv2D)          (None, 56, 56, 128)  221312      Conv_8_1ActivationLayer[0][0]    
__________________________________________________________________________________________________
Conv_8_2BatchNormalization (Bat (None, 56, 56, 128)  512         Conv_8_2_conv[0][0]              
__________________________________________________________________________________________________
Conv_8_2ActivationLayer (Activa (None, 56, 56, 128)  0           Conv_8_2BatchNormalization[0][0] 
__________________________________________________________________________________________________
up_sampling2d_3 (UpSampling2D)  (None, 112, 112, 128 0           Conv_8_2ActivationLayer[0][0]    
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, 112, 112, 192 0           up_sampling2d_3[0][0]            
                                                                 conv_pw_1_relu[0][0]             
__________________________________________________________________________________________________
Conv_9_1_conv (Conv2D)          (None, 112, 112, 96) 165984      concatenate_3[0][0]              
__________________________________________________________________________________________________
Conv_9_1BatchNormalization (Bat (None, 112, 112, 96) 384         Conv_9_1_conv[0][0]              
__________________________________________________________________________________________________
Conv_9_1ActivationLayer (Activa (None, 112, 112, 96) 0           Conv_9_1BatchNormalization[0][0] 
__________________________________________________________________________________________________
Conv_9_2_conv (Conv2D)          (None, 112, 112, 64) 55360       Conv_9_1ActivationLayer[0][0]    
__________________________________________________________________________________________________
Conv_9_2BatchNormalization (Bat (None, 112, 112, 64) 256         Conv_9_2_conv[0][0]              
__________________________________________________________________________________________________
Conv_9_2ActivationLayer (Activa (None, 112, 112, 64) 0           Conv_9_2BatchNormalization[0][0] 
__________________________________________________________________________________________________
up_sampling2d_4 (UpSampling2D)  (None, 224, 224, 64) 0           Conv_9_2ActivationLayer[0][0]    
__________________________________________________________________________________________________
concatenate_4 (Concatenate)     (None, 224, 224, 67) 0           up_sampling2d_4[0][0]            
                                                                 input_1[0][0]                    
__________________________________________________________________________________________________
Conv_10_1_conv (Conv2D)         (None, 224, 224, 48) 28992       concatenate_4[0][0]              
__________________________________________________________________________________________________
Conv_10_1BatchNormalization (Ba (None, 224, 224, 48) 192         Conv_10_1_conv[0][0]             
__________________________________________________________________________________________________
Conv_10_1ActivationLayer (Activ (None, 224, 224, 48) 0           Conv_10_1BatchNormalization[0][0]
__________________________________________________________________________________________________
Conv_10_2_conv (Conv2D)         (None, 224, 224, 32) 13856       Conv_10_1ActivationLayer[0][0]   
__________________________________________________________________________________________________
Conv_10_2BatchNormalization (Ba (None, 224, 224, 32) 128         Conv_10_2_conv[0][0]             
__________________________________________________________________________________________________
Conv_10_2ActivationLayer (Activ (None, 224, 224, 32) 0           Conv_10_2BatchNormalization[0][0]
__________________________________________________________________________________________________
spatial_dropout2d (SpatialDropo (None, 224, 224, 32) 0           Conv_10_2ActivationLayer[0][0]   
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 224, 224, 1)  33          spatial_dropout2d[0][0]          
__________________________________________________________________________________________________
reshape (Reshape)               (None, 224, 224)     0           conv2d[0][0]                     
==================================================================================================
Total params: 10,283,745
Trainable params: 10,258,689
Non-trainable params: 25,056
__________________________________________________________________________________________________

Defining Dice Coefficient function :

In [24]:
def dice_coefficient(y_true, y_pred):
    numerator = 2 * tf.reduce_sum(y_true * y_pred)
    denominator = tf.reduce_sum(y_true + y_pred)

    return numerator / (denominator + tf.keras.backend.epsilon())

Defining Loss :

In [25]:
def loss(y_true, y_pred):
    return binary_crossentropy(y_true, y_pred) - log(dice_coefficient(y_true, y_pred) + epsilon())

Now we will complie the model using parameters as follows :

loss: use the loss function defined above

optimizers: use Adam optimizer

metrics: use dice_coefficient function defined above

In [26]:
adam = Adam(lr = 1e-4, beta_1 = 0.9, beta_2 = 0.999, epsilon = None, decay = 0.0, amsgrad = False)
model.compile(loss = loss, optimizer = adam, metrics = [dice_coefficient])

Defining Checkpoint and EarlyStopping :

Early Stopping monitors the performance of the model for every epoch on a held-out validation set during the training, and terminate the training conditional on the validation performance thereby combatting overfitting issue as well.

As the epochs go by, the algorithm leans and its error on the training set naturally goes down, and so does its error on the validation set. However, after a while, the validation error stops decreasing and actually starts to go back up. This indicates that the model has started to overfit the training data. With Early Stopping, you just stop training as soon as the validation error reaches the minimum.

In [27]:
checkpoint = ModelCheckpoint('model_{loss:.2f}.h5', monitor = 'loss', verbose = 1, save_best_only = True, save_weights_only = True, mode = 'min', period = 1)
stop = EarlyStopping(monitor = 'loss', patience = 5, mode = 'min')
reduce_lr = ReduceLROnPlateau(monitor = 'loss', factor = 0.2, patience = 5, min_lr = 1e-6, verbose = 1, mode = 'min')
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.

Fit the model

batch_size: 1

callbacks: checkpoint, reduce_lr, stop

We will split train and test data in ratio of 70 : 30 respectively.

In [32]:
X_train, X_valid, y_train, y_valid = train_test_split(X, masks, test_size = 0.30, random_state = 2019, shuffle = False)
X_train.shape, X_valid.shape, y_train.shape, y_valid.shape
Out[32]:
((286, 224, 224, 3), (123, 224, 224, 3), (286, 224, 224), (123, 224, 224))
In [33]:
model.fit(X_train, y_train, epochs = 30, batch_size = 1, callbacks = [checkpoint, reduce_lr, stop], validation_data = (X_valid, y_valid))
Epoch 1/30
286/286 [==============================] - 11s 40ms/step - loss: 0.0862 - dice_coefficient: 0.9768 - val_loss: 1.1210 - val_dice_coefficient: 0.6558

Epoch 00001: loss did not improve from 0.08080
Epoch 2/30
286/286 [==============================] - 12s 40ms/step - loss: 0.0719 - dice_coefficient: 0.9856 - val_loss: 1.1276 - val_dice_coefficient: 0.6565

Epoch 00002: loss improved from 0.08080 to 0.07194, saving model to model_0.07.h5
Epoch 3/30
286/286 [==============================] - 11s 40ms/step - loss: 0.0668 - dice_coefficient: 0.9890 - val_loss: 1.1353 - val_dice_coefficient: 0.6594

Epoch 00003: loss improved from 0.07194 to 0.06680, saving model to model_0.07.h5
Epoch 4/30
286/286 [==============================] - 11s 40ms/step - loss: 0.0634 - dice_coefficient: 0.9915 - val_loss: 1.1664 - val_dice_coefficient: 0.6578

Epoch 00004: loss improved from 0.06680 to 0.06339, saving model to model_0.06.h5
Epoch 5/30
286/286 [==============================] - 11s 40ms/step - loss: 0.0633 - dice_coefficient: 0.9914 - val_loss: 1.1886 - val_dice_coefficient: 0.6585

Epoch 00005: loss improved from 0.06339 to 0.06333, saving model to model_0.06.h5
Epoch 6/30
286/286 [==============================] - 11s 39ms/step - loss: 0.0615 - dice_coefficient: 0.9928 - val_loss: 1.1632 - val_dice_coefficient: 0.6615

Epoch 00006: loss improved from 0.06333 to 0.06145, saving model to model_0.06.h5
Epoch 7/30
286/286 [==============================] - 11s 39ms/step - loss: 0.0607 - dice_coefficient: 0.9934 - val_loss: 1.2014 - val_dice_coefficient: 0.6601

Epoch 00007: loss improved from 0.06145 to 0.06073, saving model to model_0.06.h5
Epoch 8/30
286/286 [==============================] - 11s 40ms/step - loss: 0.0603 - dice_coefficient: 0.9937 - val_loss: 1.1457 - val_dice_coefficient: 0.6665

Epoch 00008: loss improved from 0.06073 to 0.06035, saving model to model_0.06.h5
Epoch 9/30
286/286 [==============================] - 11s 40ms/step - loss: 0.0610 - dice_coefficient: 0.9932 - val_loss: 1.1221 - val_dice_coefficient: 0.6723

Epoch 00009: loss did not improve from 0.06035
Epoch 10/30
286/286 [==============================] - 11s 39ms/step - loss: 0.0598 - dice_coefficient: 0.9942 - val_loss: 1.1519 - val_dice_coefficient: 0.6712

Epoch 00010: loss improved from 0.06035 to 0.05979, saving model to model_0.06.h5
Epoch 11/30
286/286 [==============================] - 11s 39ms/step - loss: 0.0613 - dice_coefficient: 0.9930 - val_loss: 1.0908 - val_dice_coefficient: 0.6774

Epoch 00011: loss did not improve from 0.05979
Epoch 12/30
286/286 [==============================] - 11s 39ms/step - loss: 0.0608 - dice_coefficient: 0.9934 - val_loss: 1.0987 - val_dice_coefficient: 0.6793

Epoch 00012: loss did not improve from 0.05979
Epoch 13/30
286/286 [==============================] - 11s 40ms/step - loss: 0.0604 - dice_coefficient: 0.9939 - val_loss: 1.1207 - val_dice_coefficient: 0.6730

Epoch 00013: loss did not improve from 0.05979
Epoch 14/30
286/286 [==============================] - 11s 39ms/step - loss: 0.0616 - dice_coefficient: 0.9933 - val_loss: 1.1590 - val_dice_coefficient: 0.6714

Epoch 00014: loss did not improve from 0.05979
Epoch 15/30
286/286 [==============================] - 11s 39ms/step - loss: 0.0618 - dice_coefficient: 0.9932 - val_loss: 1.0852 - val_dice_coefficient: 0.6831

Epoch 00015: loss did not improve from 0.05979

Epoch 00015: ReduceLROnPlateau reducing learning rate to 3.999999898951501e-06.
Out[33]:
<tensorflow.python.keras.callbacks.History at 0x7f121200e390>

We see that due to earlystopping, when loss consistently did not improve after many epochs, the model terminated. This saves our computational resources and also prevents overfitting of the model.

In [34]:
model.evaluate(X_valid, y_valid, verbose = 1)
4/4 [==============================] - 4s 1s/step - loss: 0.8607 - dice_coefficient: 0.6846
Out[34]:
[0.86069655418396, 0.6846415996551514]

Now we will get the predicted mask for a sample image :

In [35]:
# Load previous model weight
WEIGHTS_FILE = "model_0.11.h5"
learned_model = create_model()
learned_model.load_weights(WEIGHTS_FILE)
y_pred = learned_model.predict(X_valid, verbose = 1)
4/4 [==============================] - 1s 229ms/step
In [56]:
# For a sample image
n = 70
image = cv2.resize(X_valid[n], dsize = (IMAGE_HEIGHT, IMAGE_WIDTH), interpolation = cv2.INTER_CUBIC)
pred_mask = cv2.resize(1.0*(y_pred[n] > 0.1), (IMAGE_WIDTH, IMAGE_HEIGHT))

image2 = image
image2[:,:,0] = pred_mask*image[:,:,0]
image2[:,:,1] = pred_mask*image[:,:,1]
image2[:,:,2] = pred_mask*image[:,:,2]
out_image = image2

fig = plt.figure(figsize = (15, 9))
ax = fig.add_subplot(1, 1, 1)
plt.axis('off')
plt.imshow(out_image)
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Out[56]:
<matplotlib.image.AxesImage at 0x7f119165c610>
In [57]:
fig = plt.figure(figsize = (15, 9))
ax = fig.add_subplot(1, 1, 1)
plt.axis('off')
plt.imshow(pred_mask, alpha = 1)
Out[57]:
<matplotlib.image.AxesImage at 0x7f11bc04f9d0>

Impose the mask on the image

In [58]:
fig = plt.figure(figsize = (15, 9))
ax = fig.add_subplot(1, 1, 1)
plt.axis('off')
plt.imshow(X_valid[n])
plt.savefig('image.jpg', bbox_inches = 'tight', pad_inches = 0)
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
In [59]:
fig = plt.figure(figsize = (15, 9))
ax = fig.add_subplot(1, 1, 1)
plt.axis('off')
plt.imshow(y_pred[n], alpha = 0.8)
plt.savefig('mask.jpg', bbox_inches = 'tight', pad_inches = 0)
In [60]:
from google.colab.patches import cv2_imshow
img = cv2.imread('image.jpg', 1)
mask = cv2.imread('mask.jpg', 1)
img = cv2.add(img, mask)
cv2_imshow(img)

Conclusion

The project revolved around performing face recognition, primarily designed for OTT platforms, where face of the cast actors/actress can be recognised. We can see the same in Amazon Prime, when we pause a movie or series, we see the details of the actors/actress present in the current frame.

Here, we have made use of a pretrained MobileNet (Transfer Learning) and on top of it have added UNET layers to train, fit and evaluate model. The objective of the model is to predict the boundaries (masks in our case) around the face in a given image.

We also had incorporated binary cross entropy as loss, adam optimizer and dice coefficient as performance metrics.

We have our metrics as :

loss: 0.8607 - dice_coefficient: 0.6846

Model weights were used and then used to predict on validation data to get mask. Post which we checked sample image and imposed mask on the image.

Overall, we could see a substantial overlap seen in the above images and masks, therefore we can say that our model performs reasonably good in predicting the masks.

------------------------------------------------------------X------------------------------------------------------------------------------------------------X-------------------------------------------------------------------

Part 2

CONTEXT: Company X owns a movie application and repository which caters movie streaming to millions of users who on subscription basis. Company wants to automate the process of cast and crew information in each scene from a movie such that when a user pauses on the movie and clicks on cast information button, the app will show details of the actor in the scene. Company has an in-house computer vision and multimedia experts who need to detect faces from screen shots from the movie scene.

In [1]:
!pip install mtcnn
Collecting mtcnn
  Downloading https://files.pythonhosted.org/packages/67/43/abee91792797c609c1bf30f1112117f7a87a713ebaa6ec5201d5555a73ef/mtcnn-0.1.0-py3-none-any.whl (2.3MB)
     |████████████████████████████████| 2.3MB 7.5MB/s 
Requirement already satisfied: keras>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from mtcnn) (2.4.3)
Requirement already satisfied: opencv-python>=4.1.0 in /usr/local/lib/python3.7/dist-packages (from mtcnn) (4.1.2.30)
Requirement already satisfied: h5py in /usr/local/lib/python3.7/dist-packages (from keras>=2.0.0->mtcnn) (2.10.0)
Requirement already satisfied: scipy>=0.14 in /usr/local/lib/python3.7/dist-packages (from keras>=2.0.0->mtcnn) (1.4.1)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from keras>=2.0.0->mtcnn) (3.13)
Requirement already satisfied: numpy>=1.9.1 in /usr/local/lib/python3.7/dist-packages (from keras>=2.0.0->mtcnn) (1.19.5)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from h5py->keras>=2.0.0->mtcnn) (1.15.0)
Installing collected packages: mtcnn
Successfully installed mtcnn-0.1.0
In [3]:
from google.colab import drive
drive.mount('/content/drive/')
Mounted at /content/drive/

Extracting Zip file :

In [4]:
import zipfile
data_dir ='/content/drive/MyDrive/CV/Part2/Part 2 - training images.zip'
archive = zipfile.ZipFile(data_dir, 'r')
archive.extractall()
In [7]:
# extracting and plotting each detected face in a photograph 

from matplotlib import pyplot
from matplotlib import patches
from matplotlib.patches import Rectangle
from matplotlib.patches import Circle
from mtcnn.mtcnn import MTCNN
import math as mt
import pandas as pd
import os as os

Function for extracting faces :

In [6]:
def extract_faces(df):
  #We will be using MTCNN
  detector = MTCNN()
  data_dir='/content/training_images'
  images_list = []
  image_files = os.listdir( data_dir ) 
  num_folders = len(image_files)
  print(image_files)
  count=0
  for i in image_files:
    data = pyplot.imread(data_dir +'/'+ i)
    faces = detector.detect_faces(data)
    df1=pd.DataFrame(columns=['x','y','w','h','total_faces','Image_names'])

    # plotting each face as a subplot
    for j in range(len(faces)):
      x1, y1, width, height = faces[j]['box']
      df.loc[count]=[abs(x1),abs(y1), abs(width), abs(height),len(faces),i]
      count=count+1
  print(df)
In [9]:
#calling extract_faces
df = pd.DataFrame(columns=['x','y','w','h','total_faces','Image_names'])

extract_faces(df)
['real_00806.jpg', 'real_00714.jpg', 'real_00462.jpg', 'real_00009.jpg', 'real_00215.jpg', 'real_00592.jpg', 'real_00494(1).jpg', 'real_00960.jpg', 'real_00469.jpg', 'real_00398.jpg', 'real_00102.jpg', 'real_00836.jpg', 'real_00553.jpg', 'real_01025.jpg', 'real_00422.jpg', 'real_01041.jpg', 'real_00631.jpg', 'real_00401.jpg', 'real_00204.jpg', 'real_00678.jpg', 'real_00198.jpg', 'real_00287.jpg', 'real_00662.jpg', 'real_00305.jpg', 'real_00886.jpg', 'real_00931.jpg', 'real_00495(1).jpg', 'real_00481(1).jpg', 'real_00945.jpg', 'real_00190.jpg', 'real_00708.jpg', 'real_00264.jpg', 'real_00965.jpg', 'real_00820.jpg', 'real_00199.jpg', 'real_00022.jpg', 'real_00500.jpg', 'real_00315.jpg', 'real_00712.jpg', 'real_00558.jpg', 'real_00875.jpg', 'real_00019.jpg', 'real_00435.jpg', 'real_00655.jpg', 'real_00811.jpg', 'real_00760.jpg', 'real_00510.jpg', 'real_00361.jpg', 'real_00052.jpg', 'real_00684.jpg', 'real_00583.jpg', 'real_00969.jpg', 'real_00122.jpg', 'real_00217.jpg', 'real_00618.jpg', 'real_00830.jpg', 'real_00224.jpg', 'real_00166.jpg', 'real_00441.jpg', 'real_00307.jpg', 'real_00555.jpg', 'real_00089.jpg', 'real_00810.jpg', 'real_01066.jpg', 'real_00380.jpg', 'real_00627.jpg', 'real_00824.jpg', 'real_00362.jpg', 'real_00781.jpg', 'real_00949.jpg', 'real_00048.jpg', 'real_00674.jpg', 'real_00689.jpg', 'real_00383.jpg', 'real_00474.jpg', 'real_00825.jpg', 'real_00056.jpg', 'real_00928.jpg', 'real_01018.jpg', 'real_00175.jpg', 'real_00975.jpg', 'real_00841.jpg', 'real_00600.jpg', 'real_00970.jpg', 'real_00202.jpg', 'real_00002.jpg', 'real_00370.jpg', 'real_00508.jpg', 'real_00348.jpg', 'real_00742.jpg', 'real_00612.jpg', 'real_00342.jpg', 'real_00060.jpg', 'real_00693.jpg', 'real_00187.jpg', 'real_00543.jpg', 'real_00294.jpg', 'real_00282.jpg', 'real_00070.jpg', 'real_00126.jpg', 'real_00670.jpg', 'real_00716.jpg', 'real_01032.jpg', 'real_00754.jpg', 'real_00297.jpg', 'real_00108.jpg', 'real_00026.jpg', 'real_00159.jpg', 'real_00341.jpg', 'real_00605.jpg', 'real_00271.jpg', 'real_01076.jpg', 'real_00393.jpg', 'real_00092.jpg', 'real_00934.jpg', 'real_00405.jpg', 'real_00509.jpg', 'real_00897.jpg', 'real_00757.jpg', 'real_00259.jpg', 'real_00878.jpg', 'real_00654.jpg', 'real_00514.jpg', 'real_00624.jpg', 'real_00450.jpg', 'real_00611.jpg', 'real_00823.jpg', 'real_01059.jpg', 'real_00930.jpg', 'real_01054.jpg', 'real_00548.jpg', 'real_01019.jpg', 'real_00316.jpg', 'real_00789.jpg', 'real_00366.jpg', 'real_00068.jpg', 'real_00452.jpg', 'real_00274.jpg', 'real_01012.jpg', 'real_00255.jpg', 'real_00803.jpg', 'real_00906.jpg', 'real_00409.jpg', 'real_00677.jpg', 'real_00047.jpg', 'real_00270.jpg', 'real_00426.jpg', 'real_00326.jpg', 'real_00958.jpg', 'real_00927.jpg', 'real_00248.jpg', 'real_00324.jpg', 'real_00493.jpg', 'real_00499.jpg', 'real_00629.jpg', 'real_01063.jpg', 'real_00168.jpg', 'real_00164.jpg', 'real_00463.jpg', 'real_00604.jpg', 'real_00456.jpg', 'real_00819.jpg', 'real_00881.jpg', 'real_00123.jpg', 'real_00866.jpg', 'real_00502.jpg', 'real_00355.jpg', 'real_00191.jpg', 'real_00807.jpg', 'real_00114.jpg', 'real_01001.jpg', 'real_00032.jpg', 'real_00890.jpg', 'real_00481.jpg', 'real_01036.jpg', 'real_00703.jpg', 'real_00531.jpg', 'real_00891.jpg', 'real_00739.jpg', 'real_00037.jpg', 'real_01049.jpg', 'real_00046.jpg', 'real_00153.jpg', 'real_00461.jpg', 'real_00178.jpg', 'real_00539.jpg', 'real_00560.jpg', 'real_01070.jpg', 'real_00635.jpg', 'real_00301.jpg', 'real_01016.jpg', 'real_00407.jpg', 'real_00535.jpg', 'real_00943.jpg', 'real_01010.jpg', 'real_00692.jpg', 'real_00551.jpg', 'real_00298.jpg', 'real_00637.jpg', 'real_01069.jpg', 'real_00455.jpg', 'real_00855.jpg', 'real_00115.jpg', 'real_00072.jpg', 'real_00453.jpg', 'real_00957.jpg', 'real_00448.jpg', 'real_00632.jpg', 'real_00484.jpg', 'real_00471.jpg', 'real_00732.jpg', 'real_00119.jpg', 'real_01039.jpg', 'real_00901.jpg', 'real_00686.jpg', 'real_00193.jpg', 'real_00105.jpg', 'real_01045.jpg', 'real_00601.jpg', 'real_00938.jpg', 'real_00541.jpg', 'real_00812.jpg', 'real_00575.jpg', 'real_00317.jpg', 'real_00209.jpg', 'real_00997.jpg', 'real_00923.jpg', 'real_00615.jpg', 'real_00976.jpg', 'real_00771.jpg', 'real_00378.jpg', 'real_00015.jpg', 'real_00340.jpg', 'real_00863.jpg', 'real_00648.jpg', 'real_00331.jpg', 'real_00844.jpg', 'real_00984.jpg', 'real_01071.jpg', 'real_00663.jpg', 'real_00388.jpg', 'real_00750.jpg', 'real_00529.jpg', 'real_00898.jpg', 'real_01047.jpg', 'real_00877.jpg', 'real_00964.jpg', 'real_00238.jpg', 'real_00650.jpg', 'real_00146.jpg', 'real_00917.jpg', 'real_00731.jpg', 'real_00764.jpg', 'real_00879.jpg', 'real_00719.jpg', 'real_00263.jpg', 'real_00475.jpg', 'real_00873.jpg', 'real_00542.jpg', 'real_01007.jpg', 'real_00659.jpg', 'real_00384.jpg', 'real_00323.jpg', 'real_00603.jpg', 'real_00972.jpg', 'real_00064.jpg', 'real_00598.jpg', 'real_00569.jpg', 'real_00464.jpg', 'real_00161.jpg', 'real_00496.jpg', 'real_01072.jpg', 'real_00397.jpg', 'real_00179.jpg', 'real_00832.jpg', 'real_00162.jpg', 'real_00826.jpg', 'real_00835.jpg', 'real_00096.jpg', 'real_00406.jpg', 'real_00795.jpg', 'real_01006.jpg', 'real_00241.jpg', 'real_00538.jpg', 'real_00390.jpg', 'real_00081.jpg', 'real_00813.jpg', 'real_00219.jpg', 'real_00796.jpg', 'real_01074.jpg', 'real_00279.jpg', 'real_00888.jpg', 'real_00647.jpg', 'real_00100.jpg', 'real_00561.jpg', 'real_00602.jpg', 'real_00829.jpg', 'real_00852.jpg', 'real_00735.jpg', 'real_00427.jpg', 'real_00277.jpg', 'real_00242.jpg', 'real_00051.jpg', 'real_00293.jpg', 'real_00848.jpg', 'real_00779.jpg', 'real_00786.jpg', 'real_00038.jpg', 'real_00139.jpg', 'real_00184.jpg', 'real_00649.jpg', 'real_00588.jpg', 'real_00740.jpg', 'real_00497.jpg', 'real_00210.jpg', 'real_00567.jpg', 'real_00321.jpg', 'real_00582.jpg', 'real_00087.jpg', 'real_00745.jpg', 'real_00775.jpg', 'real_00925.jpg', 'real_00523.jpg', 'real_00377.jpg', 'real_00706.jpg', 'real_00506.jpg', 'real_00704.jpg', 'real_00300.jpg', 'real_00621.jpg', 'real_00644.jpg', 'real_01053.jpg', 'real_00012.jpg', 'real_01038.jpg', 'real_00911.jpg', 'real_00907.jpg', 'real_00851.jpg', 'real_01029.jpg', 'real_00268.jpg', 'real_00518.jpg', 'real_00239.jpg', 'real_01002.jpg', 'real_00063.jpg', 'real_01024.jpg', 'real_00676.jpg', 'real_00669.jpg', 'real_00738.jpg', 'real_01003.jpg', 'real_00225.jpg', 'real_00946.jpg', 'real_00145.jpg', 'real_00328.jpg', 'real_01065.jpg', 'real_00118.jpg', 'real_00154.jpg', 'real_00439.jpg', 'real_00478.jpg', 'real_00554.jpg', 'real_00981.jpg', 'real_00243.jpg', 'real_00625.jpg', 'real_00908.jpg', 'real_00157.jpg', 'real_00386.jpg', 'real_01027.jpg', 'real_00711.jpg', 'real_00860.jpg', 'real_00256.jpg', 'real_01051.jpg', 'real_00524.jpg', 'real_00926.jpg', 'real_00414.jpg', 'real_00549.jpg', 'real_01033.jpg', 'real_00980.jpg', 'real_00033.jpg', 'real_00885.jpg', 'real_00336.jpg', 'real_00504.jpg', 'real_00986.jpg', 'real_00675.jpg', 'real_00088.jpg', 'real_00791.jpg', 'real_00004.jpg', 'real_00713.jpg', 'real_00021.jpg', 'real_00652.jpg', 'real_00729.jpg', 'real_00697.jpg', 'real_00657.jpg', 'real_00850.jpg', 'real_01004.jpg', 'real_00753.jpg', 'real_00431.jpg', 'real_00968.jpg', 'real_00770.jpg', 'real_00113.jpg', 'real_00594.jpg', 'real_00097.jpg', 'real_00240.jpg', 'real_00622.jpg', 'real_00466.jpg', 'real_00565.jpg', 'real_00821.jpg', 'real_00028.jpg', 'real_00914.jpg', 'real_00479.jpg', 'real_00977.jpg', 'real_00967.jpg', 'real_00147.jpg', 'real_00080.jpg', 'real_00039.jpg', 'real_00797.jpg', 'real_00335.jpg', 'real_01062.jpg', 'real_00774.jpg', 'real_00718.jpg', 'real_00801.jpg', 'real_00691.jpg', 'real_00838.jpg', 'real_00973.jpg', 'real_00132.jpg', 'real_00289.jpg', 'real_00780.jpg', 'real_00519.jpg', 'real_00788.jpg', 'real_00413.jpg', 'real_00837.jpg', 'real_00265.jpg', 'real_00935.jpg', 'real_00343.jpg', 'real_00445.jpg', 'real_00725.jpg', 'real_00251.jpg', 'real_00296.jpg', 'real_00566.jpg', 'real_00232.jpg', 'real_01008.jpg', 'real_00842.jpg', 'real_00330.jpg', 'real_00724.jpg', 'real_00822.jpg', 'real_00109.jpg', 'real_00112.jpg', 'real_00389.jpg', 'real_00189.jpg', 'real_00208.jpg', 'real_00158.jpg', 'real_00941.jpg', 'real_00591.jpg', 'real_00346.jpg', 'real_00767.jpg', 'real_01005.jpg', 'real_00576.jpg', 'real_00160.jpg', 'real_00577.jpg', 'real_00425.jpg', 'real_00432.jpg', 'real_00418.jpg', 'real_00847.jpg', 'real_00246.jpg', 'real_00533.jpg', 'real_01057.jpg', 'real_00537.jpg', 'real_00595.jpg', 'real_00156.jpg', 'real_00236.jpg', 'real_00331(1).jpg', 'real_00953.jpg', 'real_00295.jpg', 'real_00353.jpg', 'real_00628.jpg', 'real_00579.jpg', 'real_00438.jpg', 'real_00423.jpg', 'real_00421.jpg', 'real_01013.jpg', 'real_00682.jpg', 'real_00428.jpg', 'real_00990.jpg', 'real_00392.jpg', 'real_00140.jpg', 'real_01011.jpg', 'real_00747.jpg', 'real_00979.jpg', 'real_00086.jpg', 'real_00001.jpg', 'real_00192.jpg', 'real_00900.jpg', 'real_00318(1).jpg', 'real_00861.jpg', 'real_00759.jpg', 'real_00840.jpg', 'real_00871.jpg', 'real_00889.jpg', 'real_00103.jpg', 'real_00653.jpg', 'real_00617.jpg', 'real_00785.jpg', 'real_00111.jpg', 'real_00778.jpg', 'real_00288.jpg', 'real_00507.jpg', 'real_00319(1).jpg', 'real_00350.jpg', 'real_00302.jpg', 'real_00869.jpg', 'real_00436.jpg', 'real_00473.jpg', 'real_00614.jpg', 'real_00942.jpg', 'real_00748.jpg', 'real_00250.jpg', 'real_00062.jpg', 'real_00127.jpg', 'real_00443.jpg', 'real_01037.jpg', 'real_00261.jpg', 'real_00843.jpg', 'real_00809.jpg', 'real_00176.jpg', 'real_00734.jpg', 'real_00382.jpg', 'real_00098.jpg', 'real_00808.jpg', 'real_00360.jpg', 'real_00216.jpg', 'real_00939.jpg', 'real_00094.jpg', 'real_00936.jpg', 'real_00864.jpg', 'real_00391.jpg', 'real_00235.jpg', 'real_01013(1).jpg', 'real_00120.jpg', 'real_00528.jpg', 'real_00513.jpg', 'real_00065.jpg', 'real_00010.jpg', 'real_00606.jpg', 'real_00783.jpg', 'real_00916.jpg', 'real_00597.jpg', 'real_00099.jpg', 'real_00954.jpg', 'real_00025.jpg', 'real_00220.jpg', 'real_00292.jpg', 'real_00640.jpg', 'real_00278.jpg', 'real_00071.jpg', 'real_00733.jpg', 'real_00511.jpg', 'real_00580.jpg', 'real_00043.jpg', 'real_00303.jpg', 'real_00082.jpg', 'real_00359.jpg', 'real_00333.jpg', 'real_00131.jpg', 'real_00442.jpg', 'real_00956.jpg', 'real_00952.jpg', 'real_00966.jpg', 'real_00487.jpg', 'real_00685.jpg', 'real_00369.jpg', 'real_00363.jpg', 'real_00044.jpg', 'real_00006.jpg', 'real_00310.jpg', 'real_00318.jpg', 'real_00608.jpg', 'real_00420.jpg', 'real_00234.jpg', 'real_00547.jpg', 'real_00149.jpg', 'real_00773.jpg', 'real_00352.jpg', 'real_00130.jpg', 'real_00077.jpg', 'real_00833.jpg', 'real_00633.jpg', 'real_00915.jpg', 'real_00944.jpg', 'real_00991.jpg', 'real_00133.jpg', 'real_00978.jpg', 'real_01035.jpg', 'real_00903.jpg', 'real_00075.jpg', 'real_00222.jpg', 'real_00920.jpg', 'real_00717.jpg', 'real_00312.jpg', 'real_00227.jpg', 'real_00058.jpg', 'real_00988.jpg', 'real_00710.jpg', 'real_01050.jpg', 'real_00018.jpg', 'real_00839.jpg', 'real_00634.jpg', 'real_00011.jpg', 'real_00932.jpg', 'real_00491.jpg', 'real_00231.jpg', 'real_00515.jpg', 'real_00982.jpg', 'real_00870.jpg', 'real_01075.jpg', 'real_00247.jpg', 'real_00777.jpg', 'real_00399.jpg', 'real_00899.jpg', 'real_00737.jpg', 'real_00410.jpg', 'real_00752.jpg', 'real_00859.jpg', 'real_00254.jpg', 'real_00183.jpg', 'real_00761.jpg', 'real_01052.jpg', 'real_00845.jpg', 'real_00163.jpg', 'real_00544.jpg', 'real_00894.jpg', 'real_00570.jpg', 'real_00035.jpg', 'real_00589.jpg', 'real_01048.jpg', 'real_00828.jpg', 'real_00849.jpg', 'real_00104.jpg', 'real_00909.jpg', 'real_00593.jpg', 'real_00299.jpg', 'real_00792.jpg', 'real_01046.jpg', 'real_00354.jpg', 'real_00755.jpg', 'real_00444.jpg', 'real_00971.jpg', 'real_00701.jpg', 'real_00562.jpg', 'real_01007(1).jpg', 'real_00148.jpg', 'real_00961.jpg', 'real_00419.jpg', 'real_00884.jpg', 'real_00730.jpg', 'real_00206.jpg', 'real_00314.jpg', 'real_00636.jpg', 'real_00385.jpg', 'real_00872.jpg', 'real_00373.jpg', 'real_00963.jpg', 'real_00486.jpg', 'real_00664.jpg', 'real_00921.jpg', 'real_01034.jpg', 'real_01000.jpg', 'real_00584.jpg', 'real_00658.jpg', 'real_00672.jpg', 'real_00918.jpg', 'real_01060.jpg', 'real_00258.jpg', 'real_00995.jpg', 'real_00853.jpg', 'real_00079.jpg', 'real_00590.jpg', 'real_00671.jpg', 'real_00306.jpg', 'real_00357.jpg', 'real_00769.jpg', 'real_00867.jpg', 'real_00134.jpg', 'real_00599.jpg', 'real_00309.jpg', 'real_00228.jpg', 'real_00667.jpg', 'real_00040.jpg', 'real_00892.jpg', 'real_00411.jpg', 'real_00996.jpg', 'real_00412.jpg', 'real_00776.jpg', 'real_00705.jpg', 'real_00571.jpg', 'real_00962.jpg', 'real_00446.jpg', 'real_00668.jpg', 'real_00449.jpg', 'real_01078.jpg', 'real_00482.jpg', 'real_00329.jpg', 'real_00815.jpg', 'real_00494.jpg', 'real_00846.jpg', 'real_00173.jpg', 'real_00578.jpg', 'real_01017.jpg', 'real_00516.jpg', 'real_00205.jpg', 'real_00556.jpg', 'real_00024.jpg', 'real_00116.jpg', 'real_00036.jpg', 'real_00375.jpg', 'real_00715.jpg', 'real_01081.jpg', 'real_01058.jpg', 'real_00613.jpg', 'real_00396.jpg', 'real_00272.jpg', 'real_01044.jpg', 'real_00527.jpg', 'real_00817.jpg', 'real_01080.jpg', 'real_00913.jpg', 'real_00171.jpg', 'real_00495.jpg', 'real_01061.jpg', 'real_00030.jpg', 'real_00181.jpg', 'real_00787.jpg', 'real_00141.jpg', 'real_00253.jpg', 'real_01015.jpg', 'real_01055.jpg', 'real_00339.jpg', 'real_00762.jpg', 'real_00203.jpg', 'real_00262.jpg', 'real_00029.jpg', 'real_00721.jpg', 'real_00862.jpg', 'real_00172.jpg', 'real_00275.jpg', 'real_00987.jpg', 'real_00726.jpg', 'real_00345.jpg', 'real_00772.jpg', 'real_00882.jpg', 'real_00344.jpg', 'real_00992.jpg', 'real_00799.jpg', 'real_00933.jpg', 'real_01040.jpg', 'real_00651.jpg', 'real_00720.jpg', 'real_00073.jpg', 'real_00257.jpg', 'real_00273.jpg', 'real_00814.jpg', 'real_00381.jpg', 'real_00085.jpg', 'real_00434.jpg', 'real_00673.jpg', 'real_00766.jpg', 'real_00572.jpg', 'real_00424.jpg', 'real_00498.jpg', 'real_00286.jpg', 'real_00630.jpg', 'real_00831.jpg', 'real_01021.jpg', 'real_00804.jpg', 'real_00702.jpg', 'real_00374.jpg', 'real_00741.jpg', 'real_00155.jpg', 'real_01026.jpg', 'real_00223.jpg', 'real_00151.jpg', 'real_00905.jpg', 'real_00573.jpg', 'real_00661.jpg', 'real_00429.jpg', 'real_00522.jpg', 'real_00280.jpg', 'real_00283.jpg', 'real_00327.jpg', 'real_00129.jpg', 'real_00135.jpg', 'real_00404.jpg', 'real_00782.jpg', 'real_00865.jpg', 'real_00950.jpg', 'real_00722.jpg', 'real_00201.jpg', 'real_00818.jpg', 'real_00910.jpg', 'real_00076.jpg', 'real_00351.jpg', 'real_00244.jpg', 'real_00483.jpg', 'real_00857.jpg', 'real_00356.jpg', 'real_00765.jpg', 'real_01056.jpg', 'real_00526.jpg', 'real_00645.jpg', 'real_00128.jpg', 'real_00727.jpg', 'real_00530.jpg', 'real_00180.jpg', 'real_00768.jpg', 'real_00937.jpg', 'real_00896.jpg', 'real_00643.jpg', 'real_00188.jpg', 'real_00610.jpg', 'real_00218.jpg', 'real_00454.jpg', 'real_00895.jpg', 'real_01077.jpg', 'real_00794.jpg', 'real_00137.jpg', 'real_00736.jpg', 'real_00083.jpg', 'real_00322.jpg', 'real_00174.jpg', 'real_00368.jpg', 'real_00619.jpg', 'real_00197.jpg', 'real_00709.jpg', 'real_01030.jpg', 'real_00763.jpg', 'real_00091.jpg', 'real_00027.jpg', 'real_00402.jpg', 'real_00124.jpg', 'real_00014.jpg', 'real_00743.jpg', 'real_00744.jpg', 'real_00858.jpg', 'real_00929.jpg', 'real_00182.jpg', 'real_00230.jpg', 'real_00311.jpg', 'real_00746.jpg', 'real_00338.jpg', 'real_00728.jpg', 'real_00233.jpg', 'real_00532.jpg', 'real_00802.jpg', 'real_00924.jpg', 'real_00177.jpg', 'real_00688.jpg', 'real_00517.jpg', 'real_00999.jpg', 'real_00285.jpg', 'real_00989.jpg', 'real_00459.jpg', 'real_00587.jpg', 'real_00347.jpg', 'real_00023.jpg', 'real_00165.jpg', 'real_00626.jpg', 'real_00152.jpg', 'real_00042.jpg', 'real_00416.jpg', 'real_00876.jpg', 'real_00107.jpg', 'real_00372.jpg', 'real_00237.jpg', 'real_00031.jpg', 'real_00267.jpg', 'real_00756.jpg', 'real_00922.jpg', 'real_00055.jpg', 'real_00136.jpg', 'real_00552.jpg', 'real_00951.jpg', 'real_00540.jpg', 'real_00337.jpg', 'real_00470.jpg', 'real_00696.jpg', 'real_01031.jpg', 'real_00053.jpg', 'real_00313.jpg', 'real_00660.jpg', 'real_00007.jpg', 'real_00955.jpg', 'real_00194.jpg', 'real_00505.jpg', 'real_00699.jpg', 'real_00144.jpg', 'real_00078.jpg', 'real_00364.jpg', 'real_00638.jpg', 'real_00195.jpg', 'real_00827.jpg', 'real_00940.jpg', 'real_00784.jpg', 'real_01014.jpg', 'real_00281.jpg', 'real_00387.jpg', 'real_00059.jpg', 'real_00169.jpg', 'real_01020.jpg', 'real_00557.jpg', 'real_00276.jpg', 'real_00284.jpg', 'real_00304.jpg', 'real_00520.jpg', 'real_00252.jpg', 'real_00229.jpg', 'real_00221.jpg', 'real_00008.jpg', 'real_00457.jpg', 'real_00013.jpg', 'real_00834.jpg', 'real_01079.jpg', 'real_00887.jpg', 'real_00959.jpg', 'real_00545.jpg', 'real_00061.jpg', 'real_00568.jpg', 'real_00057.jpg', 'real_00334.jpg', 'real_00117.jpg', 'real_00707.jpg', 'real_00546.jpg', 'real_00501.jpg', 'real_00138.jpg', 'real_00998.jpg', 'real_00185.jpg', 'real_00665.jpg', 'real_00093.jpg', 'real_00066.jpg', 'real_00468.jpg', 'real_00480.jpg', 'real_00563.jpg', 'real_00472.jpg', 'real_00213.jpg', 'real_01043.jpg', 'real_00142.jpg', 'real_00433.jpg', 'real_00477.jpg', 'real_00694.jpg', 'real_01068.jpg', 'real_00467.jpg', 'real_01022.jpg', 'real_00395.jpg', 'real_00490.jpg', 'real_00512.jpg', 'real_00586.jpg', 'real_00095.jpg', 'real_00666.jpg', 'real_00623.jpg', 'real_00403.jpg', 'real_00106.jpg', 'real_00620.jpg', 'real_00430.jpg', 'real_00856.jpg', 'real_00125.jpg', 'real_00581.jpg', 'real_00476.jpg', 'real_00488.jpg', 'real_00465.jpg', 'real_00609.jpg', 'real_00408.jpg', 'real_01042.jpg', 'real_00325.jpg', 'real_00437.jpg', 'real_01064.jpg', 'real_00616.jpg', 'real_00121.jpg', 'real_01009.jpg', 'real_00550.jpg', 'real_00947.jpg', 'real_00642.jpg', 'real_00798.jpg', 'real_00919.jpg', 'real_00679.jpg', 'real_00639.jpg', 'real_00400.jpg', 'real_00447.jpg', 'real_00536.jpg', 'real_00245.jpg', 'real_00367.jpg', 'real_00358.jpg', 'real_00319.jpg', 'real_00320.jpg', 'real_00290.jpg', 'real_00607.jpg', 'real_01028.jpg', 'real_00521.jpg', 'real_00016.jpg', 'real_00211.jpg', 'real_00376.jpg', 'real_00985.jpg', 'real_00585.jpg', 'real_00020.jpg', 'real_00349.jpg', 'real_00054.jpg', 'real_00371.jpg', 'real_00695.jpg', 'real_00948.jpg', 'real_00816.jpg', 'real_00249.jpg', 'real_00641.jpg', 'real_00451.jpg', 'real_00332.jpg', 'real_00101.jpg', 'real_00167.jpg', 'real_00379.jpg', 'real_00912.jpg', 'real_00394.jpg', 'real_00749.jpg', 'real_00365.jpg', 'real_00200.jpg', 'real_00045.jpg', 'real_00503.jpg', 'real_01067.jpg', 'real_00460.jpg', 'real_00680.jpg', 'real_00456(1).jpg', 'real_00646.jpg', 'real_00186.jpg', 'real_00883.jpg', 'real_00017.jpg', 'real_00212.jpg', 'real_00564.jpg', 'real_00723.jpg', 'real_00893.jpg', 'real_00758.jpg', 'real_00559.jpg', 'real_00266.jpg', 'real_00904.jpg', 'real_00800.jpg', 'real_00034.jpg', 'real_00994.jpg', 'real_01073.jpg', 'real_00090.jpg', 'real_00308.jpg', 'real_00489.jpg', 'real_00525.jpg', 'real_00269.jpg', 'real_00793.jpg', 'real_00226.jpg', 'real_00170.jpg', 'real_00214.jpg', 'real_00417.jpg', 'real_00790.jpg', 'real_00291.jpg', 'real_00480(1).jpg', 'real_00880.jpg', 'real_00492.jpg', 'real_00681.jpg', 'real_00868.jpg', 'real_00415.jpg', 'real_00003.jpg', 'real_00041.jpg', 'real_00005.jpg', 'real_00050.jpg', 'real_00805.jpg', 'real_00069.jpg', 'real_00751.jpg', 'real_00574.jpg', 'real_00698.jpg', 'real_00485.jpg', 'real_01023.jpg', 'real_00534.jpg', 'real_00596.jpg', 'real_00993.jpg', 'real_00683.jpg', 'real_00084.jpg', 'real_00690.jpg', 'real_00974.jpg', 'real_00143.jpg', 'real_00150.jpg', 'real_00983.jpg', 'real_00854.jpg', 'real_00110.jpg', 'real_00656.jpg', 'real_00902.jpg', 'real_00440.jpg', 'real_00700.jpg', 'real_00260.jpg', 'real_00196.jpg', 'real_00049.jpg', 'real_00067.jpg', 'real_00687.jpg', 'real_00874.jpg', 'real_00207.jpg', 'real_00458.jpg', 'real_00074.jpg']
        x    y    w    h total_faces     Image_names
0     133  156  306  414           1  real_00806.jpg
1     101  125  352  424           1  real_00714.jpg
2     111  166  328  421           1  real_00462.jpg
3     146   22  365  569           1  real_00009.jpg
4     106  148  350  432           1  real_00215.jpg
...   ...  ...  ...  ...         ...             ...
1098  109  119  390  470           1  real_00687.jpg
1099  101  122  348  470           1  real_00874.jpg
1100  115  131  322  438           1  real_00207.jpg
1101  120   35  366  540           1  real_00458.jpg
1102  149  121  364  441           1  real_00074.jpg

[1103 rows x 6 columns]
In [10]:
print(df.head())
     x    y    w    h total_faces     Image_names
0  133  156  306  414           1  real_00806.jpg
1  101  125  352  424           1  real_00714.jpg
2  111  166  328  421           1  real_00462.jpg
3  146   22  365  569           1  real_00009.jpg
4  106  148  350  432           1  real_00215.jpg

Function to draw each face seperately :

In [13]:
# Now we will draw each face separately
def draw_faces(filename, result_list):
  # load the image
  data = pyplot.imread(filename)
  for i in range(len(result_list)):
    # get coordinates
    x1, y1, width, height = result_list[i]['box']
    x2, y2 = x1 + width, y1 + height
    # define subplot
    fig,ax = pyplot.subplots(1)
    ax.axis('off')
    # plot face
    ax.imshow(data) #[abs(y1):abs(y2), abs(x1):abs(x2)]
    rect=patches.Rectangle((x1, y1), width, height, linewidth=2, edgecolor='r', facecolor='none')
    ax.add_patch(rect)
  # show the plot
  pyplot.show()
 
filename = '/content/training_images/real_00355.jpg'
# load image from file
pixels = pyplot.imread(filename)
# create the detector, using default weights
detector = MTCNN()
# detect faces in the image
faces = detector.detect_faces(pixels)
# display faces on the original image
draw_faces(filename, faces)
print(faces)
WARNING:tensorflow:5 out of the last 12 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fa56c58d440> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
[{'box': [118, 152, 349, 433], 'confidence': 0.9999525547027588, 'keypoints': {'left_eye': (221, 332), 'right_eye': (379, 335), 'nose': (307, 420), 'mouth_left': (239, 490), 'mouth_right': (361, 489)}}]

Conclusion :

We can see that using MTCNN, we have created Bounding Boxes with our co-ordinates i.e Annotations for our images. In the Media/OTT platform, whenever an user pauses,we can use this algorithm to create bounded boxes with co-ordinates, to view detais of the actor/actress in that frame/scene.

We aso see that, our MTCNN model reasonably performs well with relevant bounded boxes and reasonably accurate co-ordinates for the annotations.

------------------------------------------------------------X------------------------------------------------------------------------------------------------X-------------------------------------------------------------------

Part 3

CONTEXT: Company X intends to build a face identification model to recognise human faces.

• DATA DESCRIPTION: The dataset comprises of images and its mask where there is a human face.

• PROJECT OBJECTIVE: Face Aligned Face Dataset from Pinterest. This dataset contains 10,770 images for 100 people. All images are taken from 'Pinterest' and aligned using dlib library.

In [ ]:
# Mounting Google Drive
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [ ]:
# Setting the current working directory
import os; os.chdir('drive/My Drive/CV/Face_Detection')

Import necessary libraries :

In [ ]:
%tensorflow_version 2.x
In [ ]:
import pandas as pd, numpy as np, matplotlib.pyplot as plt, sklearn, re, random
import matplotlib.gridspec as gridspec
from tqdm.notebook import tqdm
import tensorflow, cv2
%matplotlib inline
from zipfile import ZipFile
from tensorflow.keras.layers import ZeroPadding2D, Convolution2D, MaxPooling2D, Dropout, Flatten, Activation
from tensorflow.keras.models import Sequential, Model
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.decomposition import PCA
random_state = 2020
import warnings; warnings.filterwarnings('ignore')

Extract zip file to access dataset :

In [ ]:
!ls
'Part 3 - Aligned Face Dataset from Pinterest.zip'
'Part 3 - vgg_face_weights.h5'
In [ ]:
with ZipFile('Part 3 - Aligned Face Dataset from Pinterest.zip', 'r') as zip:
  zip.extractall()

Function to load images :

We will define a function to load the images from the extracted folder and map each image its person id

In [ ]:
class IdentityMetadata():
    def __init__(self, base, name, file):
        # dataset base directory
        self.base = base
        # identity name
        self.name = name
        # image file name
        self.file = file

    def __repr__(self):
        return self.image_path()

    def image_path(self):
        return os.path.join(self.base, self.name, self.file) 
    
def load_metadata(path):
    metadata = []
    exts = []
    for i in os.listdir(path):
        for f in os.listdir(os.path.join(path, i)):
            # Check file extension. Allowing only jpg/jpeg' files.
            ext = os.path.splitext(f)[1]
            if ext == '.jpg' or ext == '.jpeg':
                metadata.append(IdentityMetadata(path, i, f))
                exts.append(ext)
    return np.array(metadata), exts

metadata, exts = load_metadata('PINS')
labels = np.array([meta.name for meta in metadata])

Defining function to load image :

In [ ]:
def load_image(path):
    img = cv2.imread(path, 1)
    return img[...,::-1]

Let's load a sample image :

In [ ]:
n = np.random.randint(1, len(metadata))
img_path = metadata[n].image_path()
img = load_image(img_path)
In [ ]:
fig = plt.figure(figsize = (15, 9))
ax = fig.add_subplot(1, 1, 1)
title = labels[n].split('_')[1]
ax.set_title(title, fontsize = 20)
_ = plt.imshow(img)

Defining VGG Face Model :

In [ ]:
def vgg_face_model():	
    model = Sequential()
    model.add(ZeroPadding2D((1, 1), input_shape = (224, 224, 3)))
    model.add(Convolution2D(64, (3, 3), activation = 'relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(64, (3, 3), activation = 'relu'))
    model.add(MaxPooling2D((2, 2), strides = (2, 2)))
    
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(128, (3, 3), activation = 'relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(128, (3, 3), activation = 'relu'))
    model.add(MaxPooling2D((2, 2), strides = (2, 2)))
    
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(256, (3, 3), activation = 'relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(256, (3, 3), activation = 'relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(256, (3, 3), activation = 'relu'))
    model.add(MaxPooling2D((2, 2), strides = (2, 2)))
    
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, (3, 3), activation = 'relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, (3, 3), activation = 'relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, (3, 3), activation = 'relu'))
    model.add(MaxPooling2D((2, 2), strides =(2, 2)))

    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, (3, 3), activation = 'relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, (3, 3), activation = 'relu'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, (3, 3), activation = 'relu'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2)))
    
    model.add(Convolution2D(4096, (7, 7), activation = 'relu'))
    model.add(Dropout(0.5))
    model.add(Convolution2D(4096, (1, 1), activation = 'relu'))
    model.add(Dropout(0.5))
    model.add(Convolution2D(2622, (1, 1)))
    model.add(Flatten())
    model.add(Activation('softmax'))
    return model

Now let's load the model and also weight file named 'Part 3 - vgg_face_weights.h5':

In [ ]:
model = vgg_face_model()
model.load_weights('Part 3 - vgg_face_weights.h5')
print(model.summary())
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
zero_padding2d_13 (ZeroPaddi (None, 226, 226, 3)       0         
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 224, 224, 64)      1792      
_________________________________________________________________
zero_padding2d_14 (ZeroPaddi (None, 226, 226, 64)      0         
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 112, 112, 64)      0         
_________________________________________________________________
zero_padding2d_15 (ZeroPaddi (None, 114, 114, 64)      0         
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 112, 112, 128)     73856     
_________________________________________________________________
zero_padding2d_16 (ZeroPaddi (None, 114, 114, 128)     0         
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 112, 112, 128)     147584    
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 56, 56, 128)       0         
_________________________________________________________________
zero_padding2d_17 (ZeroPaddi (None, 58, 58, 128)       0         
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 56, 56, 256)       295168    
_________________________________________________________________
zero_padding2d_18 (ZeroPaddi (None, 58, 58, 256)       0         
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 56, 56, 256)       590080    
_________________________________________________________________
zero_padding2d_19 (ZeroPaddi (None, 58, 58, 256)       0         
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 56, 56, 256)       590080    
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 28, 28, 256)       0         
_________________________________________________________________
zero_padding2d_20 (ZeroPaddi (None, 30, 30, 256)       0         
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 28, 28, 512)       1180160   
_________________________________________________________________
zero_padding2d_21 (ZeroPaddi (None, 30, 30, 512)       0         
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 28, 28, 512)       2359808   
_________________________________________________________________
zero_padding2d_22 (ZeroPaddi (None, 30, 30, 512)       0         
_________________________________________________________________
conv2d_25 (Conv2D)           (None, 28, 28, 512)       2359808   
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 14, 14, 512)       0         
_________________________________________________________________
zero_padding2d_23 (ZeroPaddi (None, 16, 16, 512)       0         
_________________________________________________________________
conv2d_26 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
zero_padding2d_24 (ZeroPaddi (None, 16, 16, 512)       0         
_________________________________________________________________
conv2d_27 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
zero_padding2d_25 (ZeroPaddi (None, 16, 16, 512)       0         
_________________________________________________________________
conv2d_28 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 7, 7, 512)         0         
_________________________________________________________________
conv2d_29 (Conv2D)           (None, 1, 1, 4096)        102764544 
_________________________________________________________________
dropout_2 (Dropout)          (None, 1, 1, 4096)        0         
_________________________________________________________________
conv2d_30 (Conv2D)           (None, 1, 1, 4096)        16781312  
_________________________________________________________________
dropout_3 (Dropout)          (None, 1, 1, 4096)        0         
_________________________________________________________________
conv2d_31 (Conv2D)           (None, 1, 1, 2622)        10742334  
_________________________________________________________________
flatten_1 (Flatten)          (None, 2622)              0         
_________________________________________________________________
activation_1 (Activation)    (None, 2622)              0         
=================================================================
Total params: 145,002,878
Trainable params: 145,002,878
Non-trainable params: 0
_________________________________________________________________
None

Let's define vgg_face_descriptor :

In [ ]:
vgg_face_descriptor = Model(inputs = model.layers[0].input, outputs = model.layers[-2].output)

Generate embeddings for each image in the dataset :

In [ ]:
# Get embedding vector for first image in the metadata using the pre-trained model

img_path = metadata[0].image_path()
img = load_image(img_path)

# Normalising pixel values from [0-255] to [0-1]: scale RGB values to interval [0, 1]
img = (img / 255.).astype(np.float32)

img = cv2.resize(img, dsize = (224, 224))
print(img.shape)

# Obtain embedding vector for an image
# Get the embedding vector for the above image using vgg_face_descriptor model and print the shape
embedding_vector = vgg_face_descriptor.predict(np.expand_dims(img, axis = 0))[0]
print(embedding_vector.shape)
(224, 224, 3)
(2622,)

Generate embeddings for all images :

We will iterate through metadata and create embeddings for each image using 'vgg_face_descriptor.predict()' and store in a list named 'embeddings'

If there is any error in reading any image in the dataset, fill the emebdding vector of that image with 2622-zeroes, since the final embedding from the model is of length 2622.

In [ ]:
embeddings = []
embeddings = np.zeros((metadata.shape[0], 2622))
for i, meta in tqdm(enumerate(metadata)):
  try:
    image = load_image(str(meta))
    image = (image/255.).astype(np.float32)
    image = cv2.resize(image, (224, 224))
    embeddings[i] = vgg_face_descriptor.predict(np.expand_dims(image, axis = 0))[0]
  except:
    embeddings[i] = np.zeros(2622)

Now let's calculate Eucledian distance between given 2 pairs of images :

In [ ]:
def distance(emb1, emb2):
    return np.sum(np.square(emb1 - emb2))

Plot images and get distance between the pairs given below

  • 5, 6 and 5, 120
  • 32, 33 and 32, 100
  • 55, 56 and 55, 89
In [ ]:
def show_pair(idx1, idx2):
    plt.figure(figsize = (8, 3))
    plt.suptitle(f'Distance = {distance(embeddings[idx1], embeddings[idx2]):.2f}')
    plt.subplot(121)
    plt.imshow(load_image(metadata[idx1].image_path()))
    plt.subplot(122)
    plt.imshow(load_image(metadata[idx2].image_path()))
In [ ]:
show_pair(5, 6)
show_pair(5, 120)
In [ ]:
show_pair(32, 33)
show_pair(32, 100)
In [ ]:
show_pair(55, 56)
show_pair(55, 89)

Split dataset into train and test sets

Creating X_train, X_test and y_train, y_test

We have train_idx to seperate out training features and labels

We have test_idx to seperate out testing features and labels

In [ ]:
train_idx = np.arange(metadata.shape[0]) % 9 != 0
test_idx = np.arange(metadata.shape[0]) % 9 == 0

# Features
X_train = np.array(embeddings)[train_idx]
X_test = np.array(embeddings)[test_idx]

# Labels
y_train = np.array([meta.name for meta in metadata[train_idx]])
y_test = np.array([meta.name for meta in metadata[test_idx]])

display(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
(9573, 2622)
(1197, 2622)
(9573,)
(1197,)

Encoding the Labels (using LabelEncoder) :

In [ ]:
# Encode labels
en = LabelEncoder()
y_train = en.fit_transform(y_train)
y_test = en.transform(y_test)

Standardize the feature values (using StandardScaler) :

In [ ]:
# Standarize features
sc = StandardScaler()
X_train_sc = sc.fit_transform(X_train)
X_test_sc = sc.transform(X_test)

Dimensionality Reduction using PCA :

We will reduce feature dimensions using Principal Component Analysis (PCA)

In [ ]:
# Covariance matrix
cov_matrix = np.cov(X_train_sc.T)

# Eigen values and vector
eig_vals, eig_vecs = np.linalg.eig(cov_matrix)

# Cumulative variance explained
tot = sum(eig_vals)
var_exp = [(i /tot) * 100 for i in sorted(eig_vals, reverse = True)]
cum_var_exp = np.cumsum(var_exp)

print('Cumulative Variance Explained', cum_var_exp)
Cumulative Variance Explained [ 13.58890703  18.98690264  22.97728235 ...  99.99999983  99.99999999
 100.        ]
In [ ]:
# Getting index where cumulative variance explained is greater than threshold
thres = 95
res = list(filter(lambda i: i > thres, cum_var_exp))[0]
index = (cum_var_exp.tolist().index(res))
print(f'Index of element just greater than {thres} is : {str(index)}')
Index of element just greater than 95 is : 347
In [ ]:
# Ploting 
plt.figure(figsize = (15 , 7.2))
plt.bar(range(1, eig_vals.size + 1), var_exp, alpha = 0.5, align = 'center', label = 'Individual explained variance')
plt.step(range(1, eig_vals.size + 1), cum_var_exp, where = 'mid', label = 'Cumulative explained variance')
plt.axhline(y = thres, color = 'r', linestyle = '--')
plt.axvline(x = index, color = 'r', linestyle = '--')
plt.ylabel('Explained Variance Ratio')
plt.xlabel('Principal Components')
plt.legend(loc = 'best')
plt.tight_layout()
plt.show()
In [ ]:
# Reducing the dimensions
pca = PCA(n_components = index, random_state = random_state, svd_solver = 'full', whiten = True)
pca.fit(X_train_sc)
X_train_pca = pca.transform(X_train_sc)
X_test_pca = pca.transform(X_test_sc)
display(X_train_pca.shape, X_test_pca.shape)
(9573, 347)
(1197, 347)

Building a Classifier :

We will use SVM Classifier to predict the person in the given image and fit it to calculate score.

In [ ]:
svc_pca = SVC(C = 1, gamma = 0.001, kernel = 'rbf', class_weight = 'balanced', random_state = random_state)
svc_pca.fit(X_train_pca, y_train)
print('SVC accuracy for train set: {0:.3f}'.format(svc_pca.score(X_train_pca, y_train)))
SVC accuracy for train set: 0.995
In [ ]:
y_pred = svc_pca.predict(X_test_pca)

# Accuracy Score
print('Accuracy Score: {}'.format(accuracy_score(y_test, y_pred).round(3)))
Accuracy Score: 0.965
In [ ]:
names = [name.split('_')[1].title().strip() for name in labels]

# Classification Report
print('Classification Report: \n{}'.format(classification_report(y_test, y_pred, target_names = np.unique(names))))
Classification Report: 
                          precision    recall  f1-score   support

              Aaron Paul       1.00      1.00      1.00        10
      Alexandra Daddario       0.91      1.00      0.95        10
            Alvaro Morte       1.00      1.00      1.00        13
Alycia Debnam Carey Face       1.00      0.92      0.96        13
             Amanda Crew       1.00      1.00      1.00         7
          Amaury Nolasco       1.00      1.00      1.00         9
        Amber Heard Face       1.00      1.00      1.00         8
               Anna Gunn       0.88      1.00      0.93        14
           Anne Hathaway       0.93      0.93      0.93        14
     Barbara Palvin Face       1.00      0.89      0.94         9
      Bellamy Blake Face       0.93      1.00      0.96        13
    Benedict Cumberbatch       0.93      1.00      0.96        13
            Betsy Brandt       1.00      1.00      1.00         9
              Bill Gates       0.91      1.00      0.95        10
        Brenton Thwaites       1.00      1.00      1.00        16
             Brie Larson       1.00      1.00      1.00        14
            Brit Marling       1.00      0.92      0.96        13
          Bryan Cranston       1.00      0.93      0.96        14
              Caity Lotz       1.00      1.00      1.00        12
        Cameron Monaghan       1.00      1.00      1.00        14
   Chadwick Boseman Face       0.89      0.94      0.91        17
          Chance Perdomo       0.89      1.00      0.94         8
             Chris Evans       1.00      0.86      0.92        14
             Chris Pratt       0.93      1.00      0.97        14
          Cobie Smulders       0.93      0.87      0.90        15
      Danielle Panabaker       1.00      1.00      1.00        14
             Dave Franco       0.93      1.00      0.97        14
            David Mazouz       1.00      1.00      1.00        10
         Dominic Purcell       1.00      1.00      1.00        12
                   Drake       1.00      0.86      0.92         7
           Dua Lipa Face       1.00      1.00      1.00        11
          Dwayne Johnson       1.00      1.00      1.00        12
            Eliza Taylor       1.00      1.00      1.00        13
    Elizabeth Olsen Face       1.00      1.00      1.00        10
               Elon Musk       0.94      1.00      0.97        16
           Emilia Clarke       0.93      1.00      0.96        13
     Emily Bett Rickards       1.00      0.80      0.89         5
              Emma Stone       0.93      1.00      0.97        14
        Emma Watson Face       1.00      1.00      1.00         8
          Gal Gadot Face       1.00      0.93      0.97        15
       Grant Gustin Face       1.00      1.00      1.00        10
         Gwyneth Paltrow       0.93      1.00      0.96        13
             Henry Cavil       1.00      0.75      0.86         8
            Jason Isaacs       1.00      0.93      0.97        15
             Jason Momoa       0.80      1.00      0.89         8
              Jeff Bezos       0.92      1.00      0.96        11
           Jeremy Renner       1.00      1.00      1.00         7
         Jesse Eisenberg       1.00      0.89      0.94         9
             Jim Parsons       0.91      1.00      0.95        10
            Jon Bernthal       1.00      1.00      1.00        11
             Josh Radnor       1.00      1.00      1.00        11
          Kiernan Shipka       1.00      0.92      0.96        13
           Kit Harington       0.94      0.94      0.94        17
    Kristen Stewart Face       0.85      0.92      0.88        12
          Krysten Ritter       1.00      1.00      1.00        13
         Kumail Nanjiani       1.00      1.00      1.00        11
     Lindsey Morgan Face       1.00      1.00      1.00        10
         Maisie Williams       1.00      1.00      1.00         9
      Margot Robbie Face       1.00      1.00      1.00        11
           Maria Pedraza       1.00      1.00      1.00         9
            Mark Ruffalo       1.00      1.00      1.00        13
         Mark Zuckerberg       1.00      1.00      1.00        16
            Martin Starr       1.00      1.00      1.00        15
          Melissa Benoit       1.00      1.00      1.00        19
           Miguel Herran       0.94      0.94      0.94        16
             Mike Colter       1.00      0.88      0.94        17
      Millie Bobby Brown       1.00      0.94      0.97        16
         Morena Baccarin       1.00      0.94      0.97        16
          Morgan Freeman       0.90      0.90      0.90        10
         Natalie Portman       1.00      0.90      0.95        10
     Neil Patrick Harris       1.00      0.79      0.88        14
               Paul Rudd       1.00      1.00      1.00        13
            Pedro Alonso       1.00      1.00      1.00        12
          Peter Dinklage       1.00      1.00      1.00         4
              Rami Melek       1.00      1.00      1.00        15
                 Rihanna       1.00      1.00      1.00        12
                Rj Mitte       1.00      1.00      1.00        20
   Robert Downey Jr Face       1.00      1.00      1.00         9
          Robert Knepper       0.89      0.94      0.92        18
            Robin Taylor       0.84      0.89      0.86        18
           Ryan Reynolds       1.00      1.00      1.00        13
     Sarah Wayne Callies       0.93      1.00      0.97        14
      Scarlett Johansson       1.00      0.90      0.95        10
            Sean Pertwee       1.00      0.95      0.97        19
          Sebastian Stan       0.86      0.92      0.89        13
            Selena Gomez       1.00      0.88      0.93         8
                 Shakira       0.94      1.00      0.97        16
           Sophie Turner       0.78      1.00      0.88         7
           Stephen Amell       1.00      1.00      1.00         7
           Sundar Pichai       1.00      1.00      1.00         9
          Tati Gabrielle       1.00      1.00      1.00         9
            Taylor Swift       1.00      1.00      1.00        14
      Thomas Middleditch       1.00      1.00      1.00        12
            Tom Cavanagh       1.00      1.00      1.00         9
        Tom Holland Face       0.73      0.80      0.76        10
          Ursula Corbero       1.00      0.67      0.80         6
        Wentworth Miller       1.00      1.00      1.00         7
           Willa Holland       1.00      1.00      1.00        11
        William Fichtner       0.92      0.92      0.92        13
                 Zendaya       0.92      1.00      0.96        12

                accuracy                           0.96      1197
               macro avg       0.97      0.96      0.96      1197
            weighted avg       0.97      0.96      0.96      1197

Test results

Let's take maybe 50th image from test set and plot the image and map to which person(folder name in dataset) that image belongs.

In [ ]:
def sample_img_plot(sample_idx):
  # Load image for sample_idx from test data
  sample_img = load_image(metadata[test_idx][sample_idx].image_path())
  # Get actual name
  actual_name = metadata[test_idx][sample_idx].name.split('_')[-1].title().strip()
  # Normalizing pixel values
  sample_img = (sample_img/255.).astype(np.float32)
  # Resize
  sample_img = cv2.resize(sample_img, (224, 224))

  # Obtain embedding vector for sample image
  embedding = vgg_face_descriptor.predict(np.expand_dims(sample_img, axis = 0))[0]
  # Scaled the vector and reshape
  embedding_scaled = sc.transform(embedding.reshape(1, -1))
  # Predict
  sample_pred = svc_pca.predict(pca.transform(embedding_scaled))
  # Transform back
  pred_name = en.inverse_transform(sample_pred)[0].split('_')[-1].title().strip()
  return sample_img, actual_name, pred_name
In [ ]:
# Plot for 50th image in test data
sample_img, actual_name, pred_name = sample_img_plot(50)
fig = plt.figure(figsize = (15, 9))
plt.axis('off')
plt.imshow(sample_img)
plt.title(f"A: {actual_name} \n P: {pred_name}", color = 'green' if actual_name == pred_name else 'red')
plt.show()
In [ ]:
# Now let's try with random 20 sample images from test data
plt.figure(figsize = (15, 15))
gs1 = gridspec.GridSpec(5, 4)
gs1.update(wspace = 0, hspace = 0.3) 

for i in range(20):
    ax1 = plt.subplot(gs1[i])
    plt.axis('on')
    ax1.set_xticklabels([])
    ax1.set_yticklabels([])
    ax1.set_aspect('equal')
    
    sample_img, actual_name, pred_name = sample_img_plot(random.randint(1, 1197))
  
    plt.axis('off')
    plt.imshow(sample_img)
  
    plt.title(f"\n\n\nActual : {actual_name} \n Predicted : {pred_name}", color = 'green' if actual_name == pred_name else 'red')
plt.show()

Conclusion :

We have been invloved to recognize aligned faces from a dataset containing 10000+ images for 100 people using a pre-trained model on Face Recognition.

Firstly, we used VGG model with pre-trained weights was used to generate embeddings for each images in our dataset.

Then, we calculated Eucledian Distance between two pair of images and also plotted them.

Since, there were 2,622 features for each image, PCA was used for dimension reduction after standardizing the features using StndardScaler.We got a cumulative explained variance of 95%, and 347 PCA components were used.

Using SVC we predicted the labels for test dataset with an accuracy of more than 96% which seems to be a resonably very good accuracy.

Finally, comparing predicted and actual labels for a given sample image as well as for 20 random images from test dataset, we see that our model performs well in face recognition.

------------------------------------------------------------X------------------------------------------------------------------------------------------------X-------------------------------------------------------------------

Part 4

DOMAIN: State traffic department

• CONTEXT: City X’s traffic department wants to understand the traffic density on road during busy hours in order to efficiently program their traffic lights.

• TASK: Create an automation using computer vision to impute dynamic bounding boxes to locate cars or vehicles on the road. It would require for you to do some research on how to impute bounding boxes on video file. You can use video provided with this assignment or any video of your choice which has moving cars to impute bounding boxes.

In [ ]:
 
In [ ]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import glob
%matplotlib inline

import keras
from keras.models import Sequential
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.core import Flatten, Dense, Activation, Reshape

Building Model :

The model architecture consists of 9 convolutional layers, followed by 3 fully connected layers. Each convolutional layer is followed by a Leaky RELU activation function, with alpha of 0.1. The first 6 convolutional layers also have a 2x2 max pooling layers.

In [ ]:
 
In [ ]:
#keras.backend.set_image_dim_ordering('th')

def get_model():
    model = Sequential()
    
    # Layer 1
    model.add(Convolution2D(16, (3, 3),input_shape=(3, 448,448),padding='same'))
    model.add(LeakyReLU(alpha=0.1))
    model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))
    
    # Layer 2
    model.add(Convolution2D(32,(3,3) ,padding='same'))
    model.add(LeakyReLU(alpha=0.1))
    model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))
    
    # Layer 3
    model.add(Convolution2D(64,(3,3) ,padding='same'))
    model.add(LeakyReLU(alpha=0.1))
    model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
    
    # Layer 4
    model.add(Convolution2D(128,(3,3) ,padding='same'))
    model.add(LeakyReLU(alpha=0.1))
    model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))
    
    # Layer 5
    model.add(Convolution2D(256, (3,3) ,padding='same'))
    model.add(LeakyReLU(alpha=0.1))
    model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))
    
    # Layer 6
    model.add(Convolution2D(512, (3,3) ,padding='same'))
    model.add(LeakyReLU(alpha=0.1))
    model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))
    
    # Layer 7
    model.add(Convolution2D(1024, (3,3) ,padding='same'))
    model.add(LeakyReLU(alpha=0.1))
    
    # Layer 8
    model.add(Convolution2D(1024, (3,3) ,padding='same'))
    model.add(LeakyReLU(alpha=0.1))
    
    # Layer 9
    model.add(Convolution2D(1024, (3,3) ,padding='same'))
    model.add(LeakyReLU(alpha=0.1))
    
    model.add(Flatten())
    
    # Layer 10
    model.add(Dense(256))
    
    # Layer 11
    model.add(Dense(4096))
    model.add(LeakyReLU(alpha=0.1))
    
    # Layer 12
    model.add(Dense(1470))
    
    return model

Pre-processing

We first determine the area of interest for each image. We only consider this portion of the image for prediction, since cars won't be present all over the image, just on the roads in the lower portion of the image. Then this cropped image is resized to a 448x448 image.

Normalization :

Each image pixel is normalized to have values between -1 and 1. We use simple min-max normalization to achieve this.

In [ ]:
# Preprocessing

def crop_and_resize(image):
    cropped = image[300:650,500:,:]
    return cv2.resize(cropped, (448,448))

def normalize(image):
    normalized = 2.0*image/255.0 - 1
    return normalized

def preprocess(image):
    cropped = crop_and_resize(image)
    normalized = normalize(cropped)
    # The model works on (channel, height, width) ordering of dimensions
    transposed = np.transpose(normalized, (2,0,1))
    return transposed
In [ ]:
# code based on:
# YAD2K https://github.com/allanzelener/YAD2K
# darkflow https://github.com/thtrieu/darkflow
# Darknet.keras https://github.com/sunshineatnoon/Darknet.keras
# https://github.com/xslittlegrass/CarND-Vehicle-Detection

# Box util methods

class Box:
    def __init__(self):
        self.x, self.y = float(), float()
        self.w, self.h = float(), float()
        self.c = float()
        self.prob = float()
        
def overlap(x1, w1, x2, w2):
    l1 = x1 - w1 / 2.
    l2 = x2 - w2 / 2.
    left = max(l1, l2)
    r1 = x1 + w1 / 2.
    r2 = x2 + w2 / 2.
    right = min(r1, r2)
    return right - left


def box_intersection(a, b):
    """

    :param a: Box 1
    :param b: Box 2
    :return: Intersection area of the 2 boxes
    """
    w = overlap(a.x, a.w, b.x, b.w)
    h = overlap(a.y, a.h, b.y, b.h)
    if w < 0 or h < 0:
        return 0
    area = w * h
    return area


def box_union(a, b):
    """

    :param a: Box 1
    :param b: Box 2
    :return: Area under the union of the 2 boxes
    """
    i = box_intersection(a, b)
    u = a.w * a.h + b.w * b.h - i
    return u


def box_iou(a, b):
    """

    :param a: Box 1
    :param b: Box 2
    :return: Intersection over union, which is ratio of intersection area to union area of the 2 boxes
    """
    return box_intersection(a, b) / box_union(a, b)



def yolo_output_to_car_boxes(yolo_output, threshold=0.2, sqrt=1.8, C=20, B=2, S=7):

    # Position for class 'car' in the VOC dataset classes
    car_class_number = 6

    boxes = []
    SS = S*S  # number of grid cells
    prob_size = SS*C  # class probabilities
    conf_size = SS*B  # confidences for each grid cell

    probabilities = yolo_output[0:prob_size]
    confidence_scores = yolo_output[prob_size: (prob_size + conf_size)]
    cords = yolo_output[(prob_size + conf_size):]

    # Reshape the arrays so that its easier to loop over them
    probabilities = probabilities.reshape((SS, C))
    confs = confidence_scores.reshape((SS, B))
    cords = cords.reshape((SS, B, 4))

    for grid in range(SS):
        for b in range(B):
            bx = Box()

            bx.c = confs[grid, b]

            # bounding box xand y coordinates are offsets of a particular grid cell location,
            # so they are also bounded between 0 and 1.
            # convert them absolute locations relative to the image size
            bx.x = (cords[grid, b, 0] + grid % S) / S
            bx.y = (cords[grid, b, 1] + grid // S) / S


            bx.w = cords[grid, b, 2] ** sqrt
            bx.h = cords[grid, b, 3] ** sqrt

            # multiply confidence scores with class probabilities to get class sepcific confidence scores
            p = probabilities[grid, :] * bx.c

            # Check if the confidence score for class 'car' is greater than the threshold
            if p[car_class_number] >= threshold:
                bx.prob = p[car_class_number]
                boxes.append(bx)

    # combine boxes that are overlap

    # sort the boxes by confidence score, in the descending order
    boxes.sort(key=lambda b: b.prob, reverse=True)


    for i in range(len(boxes)):
        boxi = boxes[i]
        if boxi.prob == 0:
            continue

        for j in range(i + 1, len(boxes)):
            boxj = boxes[j]

            # If boxes have more than 40% overlap then retain the box with the highest confidence score
            if box_iou(boxi, boxj) >= 0.4:
                boxes[j].prob = 0

    boxes = [b for b in boxes if b.prob > 0]

    return boxes


def draw_boxes(boxes,im, crop_dim):
    imgcv1 = im.copy()
    [xmin, xmax] = crop_dim[0]
    [ymin, ymax] = crop_dim[1]
    
    height, width, _ = imgcv1.shape
    for b in boxes:
        w = xmax - xmin
        h = ymax - ymin

        left  = int ((b.x - b.w/2.) * w) + xmin
        right = int ((b.x + b.w/2.) * w) + xmin
        top   = int ((b.y - b.h/2.) * h) + ymin
        bot   = int ((b.y + b.h/2.) * h) + ymin

        if left  < 0:
            left = 0
        if right > width - 1:
            right = width - 1
        if top < 0:
            top = 0
        if bot>height - 1: 
            bot = height - 1
        
        thick = 5 #int((height + width // 150))
        
        cv2.rectangle(imgcv1, (left, top), (right, bot), (255,0,0), thick)

    return imgcv1
In [ ]:
def load_weights(model, yolo_weight_file):
    data = np.fromfile(yolo_weight_file, np.float32)
    data = data[4:]

    index = 0
    for layer in model.layers:
        shape = [w.shape for w in layer.get_weights()]
        if shape != []:
            kshape, bshape = shape
            bia = data[index:index + np.prod(bshape)].reshape(bshape)
            index += np.prod(bshape)
            ker = data[index:index + np.prod(kshape)].reshape(kshape)
            index += np.prod(kshape)
            layer.set_weights([ker, bia])

Loading weights file :

Now we will load our yolo weight file for this model. We have weight file for this specific model.

In [ ]:
from google.colab import drive
drive.mount('/content/drive/')
Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).
In [ ]:
#Extracting Zip file
import zipfile
data_dir ='/content/drive/MyDrive/CV/Part4/Part 4 Video.zip'
archive = zipfile.ZipFile(data_dir, 'r')
archive.extractall()
In [ ]:
#weight_file = '/content/drive/MyDrive/CV/Part4/'  
model = get_model()
load_weights(model,'/content/drive/MyDrive/CV/Part4/yolo-tiny.weights')
In [ ]:
test_image = mpimg.imread('/content/drive/MyDrive/CV/Part4/test1.jpg')
pre_processed = preprocess(test_image)
batch = np.expand_dims(pre_processed, axis=0)
batch_output = model.predict(batch)
print(batch_output.shape)
In [ ]:
test_image = mpimg.imread('/content/drive/MyDrive/CV/Part4/test1.jpg')
boxes = yolo_output_to_car_boxes(batch_output[0], threshold=0.25)
final = draw_boxes(boxes, test_image, ((500,1280),(300,650)))

# plt.rcParams['figure.figsize'] = (10, 5.6)
plt.subplot(1,2,1)
plt.imshow(test_image)
plt.axis('off')
plt.title("Original Image")
plt.subplot(1,2,2)
plt.imshow(final)
plt.axis('off')
plt.title("With Boxes")
In [ ]:
# Final pipeline
def pipeline(image):
    pre_processed = preprocess(image)
    batch = np.expand_dims(pre_processed, axis=0)
    batch_output = model.predict(batch)
    boxes = yolo_output_to_car_boxes(batch_output[0], threshold=0.20)
    final = draw_boxes(boxes, image, ((500,1280),(300,650)))
    return final
In [ ]:
filenames = glob.glob("test_images/*.jpg")
num_files = len(filenames)

plt.rcParams['figure.figsize'] = (10, 20)

for i in range(num_files):
    image = mpimg.imread(filenames[i])
    final = pipeline(image)
    mpimg.imsave("output_images/test%d.jpg" % (i+1), final)
    
    plt.subplot(num_files,2,i*2+1)
    plt.imshow(image)
    plt.axis('off')
    plt.title("Test Image %d" % (i+1))
    plt.subplot(num_files,2,i*2+2)
    plt.imshow(final)
    plt.axis('off')
    plt.title("Output %d" % (i+1))
In [ ]:
# Apply it on a video
from moviepy.editor import VideoFileClip

project_video_output = 'project_video_output.mp4'
clip1 = VideoFileClip("Part 4 Video.mp4")
lane_clip = clip1.fl_image(pipeline)
%time lane_clip.write_videofile(project_video_output, audio=False)
[MoviePy] >>>> Building video project_video_output.mp4
[MoviePy] Writing video project_video_output.mp4
100%|█████████▉| 1260/1261 [1:11:36<00:01,  1.67s/it]  | 1/1261 [00:01<38:38,  1.84s/it]
[MoviePy] Done.
[MoviePy] >>>> Video ready: project_video_output.mp4 

CPU times: user 1h 55min 26s, sys: 3min 57s, total: 1h 59min 23s
Wall time: 1h 11min 38s

Conclusion

We see that after preprocessing and normalization, we build our model which is able to create bounded box.

We took reference of various models and took help of some test images to test our model on images first as well.

Also, for videos, we used VideoFileClip from moviepy.editor and it was reasonably good in creation of bounded boxes.